Sure, but we're talking about RPC/syscall for disk and network transfers from the CPU side. Almost nothing on the CPU side can sustain 1 TB/s anyway--you can only do GPU->GPU transforms for that--and even then for very specific workloads. And the only reason you are reaching off the GPU is because you either need new data or you need the CPU to chew on something that the GPU can't manage due to branchiness.
A PCIe 4.0x16 link gives 32 GB/s bandwidth; an RTX 4090 has over 1 TB/s bandwidth to its on-card memory.