That sounded a bit low. This https://en.wikichip.org/wiki/intel/microarchitectur...

drewg123 · on July 21, 2019

Is that intel ARK published spec sheet bandwidth, or actual usable bandwidth? There is a difference.

I've found I get about 75-80% of the advertised bandwidth both from my real app (TLS crypto) and a toy memory copy benchmark using AVX256 instructions. The toy memory copy benchmark is how I realized that my bottleneck was actually memory bandwidth and not CPU horsepower on Broadwell based servers.

tempguy9999 · on July 21, 2019

I haven't a clue. I just got it off the link quoted. It's a good question and when there's a difference, you know what marketing will say.

To make a stab, I suppose it might depend on whether all requests are coming from a single memory bank or spread evenly across all memory banks, assuming fully populated (again from the link "Octa-channel (up from hexa-channel)")

shaklee3 · on July 22, 2019

Intel is dominated by how many cores/threads are accessing simultaneously. So with many scientific libraries you can get about 80% of the throughput. And the V100 will not give you 900GBps. That's the theoretical, but nominally it's about 750GBps.

jlebar · on July 22, 2019

My project, XLA, will get you quite close to the nominal 900GB/s. :)

shaklee3 · on July 23, 2019

Do you have more information?