> ... or a more complex system to try to avoid throwing away so much work. IIRC ...

amluto · on May 15, 2023

I’m sure this is doable, but I would certainly count it as “complex”.

But fundamentally, a given chip, dedicating a given area to the task, can only begin to decode at so many positions per cycle. And the more intelligent it tries to be about where to start decoding, the longer into that cycle it needs to wait.

And one nastiness about x86 is that you have to decode pretty far into an instruction to even determine its likely length. You can’t do something like looking up the likely length in a table indexed by the first byte of an instruction.

I wonder whether modern chips have pipelined decoders.

clamchowder · on May 15, 2023

All modern chips have pipelined decoders, including ARM ones. For example, the Cortex A72 has three decode stages, and it's running a 3-wide decoder at low clock speeds.