> BTW, have you heard that the next Haswell processors can get only 5% performance improvement?
Bollocks - using the new gather AVX instructions, I've seen close to a 40% increase over IB on some floating-point code I've hand-written with intrinsics.
Existing C++ code is around 13-16% faster thanks to better cache bandwidth and a huge L4 cache. Turn on FMA (fused multiply–add) optimisation and that goes to ~20% faster.
Bollocks - using the new gather AVX instructions, I've seen close to a 40% increase over IB on some floating-point code I've hand-written with intrinsics.
Existing C++ code is around 13-16% faster thanks to better cache bandwidth and a huge L4 cache. Turn on FMA (fused multiply–add) optimisation and that goes to ~20% faster.