> designing MRISC32 have made me appreciate the clever encoding tricks that go into fitting all instructions into 32 bits.
Keep in mind that the average RISC ISA uses 5 bit registers IDs and uses three-arg form for most instructions, that's 15 bits gone. While AMD64 uses 4 bit register IDs and uses two-arg form for most instructions, which is only 8 bits.
Also, the encoding scheme that Agner describes is not a fixed width encoding. It's variable width with 16bit, 32bit, 48bit and 64bit uops. There are also some (hopefully rare) uops which don't fit in the uop cache's encoding scheme (forcing a fallback to the instruction decoders). Those two relief valves allow such an encoding to avoid the needs for the complex encoding tricks of a proper fixed width encoding.
So I find the scheme to be plausible, though what you say about decoders after the uOP cache is true.
> Keep in mind that the average RISC ISA uses 5 bit registers IDs and uses three-arg form for most instructions, that's 15 bits gone. While AMD64 uses 4 bit register IDs and uses two-arg form for most instructions, which is only 8 bits.
Yes, I'm aware of that (that's what I meant with "destructive operands").
But then you have AVX-512 that has 32 architectural registers, and some instructions support three register operands (LEA, VEX-encoding, ...). So there you have the choice of having multiple encodings that will be more compact, or a few simple encodings that will require wider encoding.
Since x86 isn't known for having a small and consistent instruction set, any attempts to streamline the post-decode encoding will likely gravitate towards a wider encoding (i.e. worst case least common denominator).
It would be an interesting exercise to try to fit all the existing x86 instructions (post-cracking) into an encoding of 32, 48 or 64 bits, for instance. But I don't envy the Intel & AMD engineers.
I've had "attempt to implement a wide x86 instruction decoder on an FPGA" on my list of potential hobby projects for a long time. Though I was only planning to do the length decoder. My understanding is that length decoding is the critical path and any additional decoding after that is simple from a timing perspective (and very complex from every other perspective).
But now you are making want to experiment with compact uop encodings too.
Keep in mind that the average RISC ISA uses 5 bit registers IDs and uses three-arg form for most instructions, that's 15 bits gone. While AMD64 uses 4 bit register IDs and uses two-arg form for most instructions, which is only 8 bits.
Also, the encoding scheme that Agner describes is not a fixed width encoding. It's variable width with 16bit, 32bit, 48bit and 64bit uops. There are also some (hopefully rare) uops which don't fit in the uop cache's encoding scheme (forcing a fallback to the instruction decoders). Those two relief valves allow such an encoding to avoid the needs for the complex encoding tricks of a proper fixed width encoding.
So I find the scheme to be plausible, though what you say about decoders after the uOP cache is true.