I'm not sure how this makes sense. Tuning C++ for speed is mostly weeding out ca...

I'm not sure how this makes sense. Tuning C++ for speed is mostly weeding out cache misses through memory access patterns. If memory is accesses linearly the prefetcher will get it ahead of time. If cache sizes are taken into account, you can not only cut down on memory latency, but memory bandwidth as well.