Couldn't Intel just come out with a new set of reduced-complexity instructions that run on a per-process basis based on some bit being flipped on context switches? Then legacy apps would run fine, but the new stuff would work too. This seems not that hard to address.
As I understand it, the challenge to making wider x86 chips is the mere existence of some instructions. Adding new instructions can't help with that. But I'm just repeating what I heard elsewhere:
> Other contemporary designs such as AMD’s Zen(1 through 3) and Intel’s µarch’s, x86 CPUs today still only feature a 4-wide decoder designs (Intel is 1+4) that is seemingly limited from going wider at this point in time due to the ISA’s inherent variable instruction length nature, making designing decoders that are able to deal with aspect of the architecture more difficult compared to the ARM ISA’s fixed-length instructions.
I find that odd. Don’t they have some sort of icache? Intel could decode into a fixed width Alternative instruction set inside the icache, then use a wider decode when actually executing.
Yes, they have a cache for decoded operations. It'll hold a certain number but it's sort of inefficient because the fixed width decoded instructions are a lot larger than the variable length instructions so it doesn't hold too many. Because it doesn't help on code with large footprints and not too much time in inner loops you don't necessarily want the number of ops you can get form it to be too much more than the width of the rest of the system if you want a balanced design.