Just to nitpick, backwards compatibility isn't really a huge issue for Intel. Mo...

arcticbull · on July 25, 2015

That's not really true; backwards compatibility on x86 architectures takes a tremendous amount of power and die space, and the 'throw it in microcode' solution only partially mitigates this issue.

A paper (http://www.ic.unicamp.br/~ra045840/cardoso2013wivosca.pdf) states that a mostly-microcode solution would still require 20% of the die area to be dedicated solely to microcode ROM.

I can't remember where I read it but something like 30+% of an Intel CPU die area/power consumption is due to the x86 ISA. Apparently the original Pentium CPU was 40% instruction decoding by die area. And the ISA has grown enormously since then.

varelse · on July 25, 2015

"CPUs are super restricted by the single threaded, branching nature of the code you run on them, and this is what makes CPU performance a little more nuanced, and not directly comparable."

Ironically, to really hit peak performance of a modern AVX2 or later CPU, you have to embrace many of the design principles that lead to efficient GPU code:

1. Multiple threads per core to make use of the dual vector units introduced in Haswell

2. SIMD-like thinking to remap tasks into the 8-way and soon to be 16-way vector units

3. Running multiple threads across multiple cores

4. Micromanaging the L1 cache and treating the AVX/SSE registers as L0 cache

Where the CPU prevails is for fundamentally serial algorithms that cannot be mapped into a SIMD implementation. Mike Acton's Data-Oriented Design covers this case nicely IMO.