Is that because of ISA complexity? Especially with x86 growth by accretion while...

amluto · on May 15, 2023

It’s mainly (I think) the distribution of instruction lengths. x86 instructions have any length from 1 to 15 bytes. A decoder wants to decode multiple instructions per cycle, and it generally does this in parallel, by simultaneously deciding at multiple starting points. With a fixed length ISA, to decode n instructions, you just decode them. With x86, if you simultaneously decode at offset 0, 1, …, 7, you have 8 decoders but are only likely to decode a couple of correct instructions. The rest start in the middle of an instruction and need to be discarded. So you either need many more parallel decoders for the same throughput or a more complex system to try to avoid throwing away so much work.

jabl · on May 15, 2023

> ... or a more complex system to try to avoid throwing away so much work.

IIRC Jim Keller said in some interview that modern x86 uses prediction and speculation (similar to branch prediction), and it works surprisingly well.

amluto · on May 15, 2023

I’m sure this is doable, but I would certainly count it as “complex”.

But fundamentally, a given chip, dedicating a given area to the task, can only begin to decode at so many positions per cycle. And the more intelligent it tries to be about where to start decoding, the longer into that cycle it needs to wait.

And one nastiness about x86 is that you have to decode pretty far into an instruction to even determine its likely length. You can’t do something like looking up the likely length in a table indexed by the first byte of an instruction.

I wonder whether modern chips have pipelined decoders.

clamchowder · on May 15, 2023

All modern chips have pipelined decoders, including ARM ones. For example, the Cortex A72 has three decode stages, and it's running a 3-wide decoder at low clock speeds.

anticensor · on May 20, 2023

Intel has your wishes covered: https://cdrdv2-public.intel.com/776648/x86s-EAS-v1-4-17-23-1...