Yep, we generally care about growing a few bandwidth #'s over current:
- GPU<>CPU/RAM
- GPU<>storage
- GPU<>network
(- GPU<>GPU bandwidth is already insane, as is GPU compute speed)
In the above, they're about cases like logs where there is ~infinite off-GPU data (S3, storage, ...), yet current PCI etc CPU stuff is like a tiny straw clogging it all.
It's now ~easy to do stuff like regex search on GPU, so systems being redesigned to quickly shove 1TB through a python 1-liner is awesome.
To get a feel for where all this is in practice, I did a fun talk w/ the pavilion team for this year's GTC on building graphistry UIs & interactive dashboards on top of this: https://pavilion.io/nvidia/
Edit: A good search term here is 'GPU Direct Storage', which is explicitly about skipping the CPU bandwidth indirection & performance handcuffs. Tapping directly into the network or storage is super exciting for matching what the compute tier can do!
- GPU<>CPU/RAM
- GPU<>storage
- GPU<>network
(- GPU<>GPU bandwidth is already insane, as is GPU compute speed)
In the above, they're about cases like logs where there is ~infinite off-GPU data (S3, storage, ...), yet current PCI etc CPU stuff is like a tiny straw clogging it all.
It's now ~easy to do stuff like regex search on GPU, so systems being redesigned to quickly shove 1TB through a python 1-liner is awesome.
To get a feel for where all this is in practice, I did a fun talk w/ the pavilion team for this year's GTC on building graphistry UIs & interactive dashboards on top of this: https://pavilion.io/nvidia/
Edit: A good search term here is 'GPU Direct Storage', which is explicitly about skipping the CPU bandwidth indirection & performance handcuffs. Tapping directly into the network or storage is super exciting for matching what the compute tier can do!