Is that intel ARK published spec sheet bandwidth, or actual usable bandwidth? There is a difference.
I've found I get about 75-80% of the advertised bandwidth both from my real app (TLS crypto) and a toy memory copy benchmark using AVX256 instructions. The toy memory copy benchmark is how I realized that my bottleneck was actually memory bandwidth and not CPU horsepower on Broadwell based servers.
I haven't a clue. I just got it off the link quoted. It's a good question and when there's a difference, you know what marketing will say.
To make a stab, I suppose it might depend on whether all requests are coming from a single memory bank or spread evenly across all memory banks, assuming fully populated (again from the link "Octa-channel (up from hexa-channel)")
Intel is dominated by how many cores/threads are accessing simultaneously. So with many scientific libraries you can get about 80% of the throughput. And the V100 will not give you 900GBps. That's the theoretical, but nominally it's about 750GBps.
https://en.wikichip.org/wiki/intel/microarchitectures/cooper...
says "Higher bandwidth (174.84 GiB/s, up from 119.209 GiB/s)"
I don't know if memory bandwidth matters for this type of job, though.