Wow, this a wonderfully rich post! I had a question about the following statemen...

woodruffw · on Oct 31, 2019

Thanks for the kind words!

> Can someone elaborate on how a instruction at the machine level can be "overloaded"? At this machine level how can an instruction be mapped to more than one entry in the microcode table?

Yep! Instruction overloading can occur in a few different senses:

1. As different valid permutations of operands and prefixes, e.g. `mov`

2. As having totally different functionalities in different privilege or architecture modes

3. As being repurposed entirely (e.g., inc/dec r32 are now REX prefixes)

Instruction-to-microcode translation is, unfortunately, not as simple as a (single) table lookup on x86_64 ;)

bogomipz · on Oct 31, 2019

Thanks for the examples. This is helpful. I can't help but wonder if you or anyone else might be to elaborate on your last point:

>"Instruction-to-microcode translation is, unfortunately, not as simple as a (single) table lookup on x86_64 ;)"

Is the because of the overloading or are there other reasons it's not as simple as a LUT?

Cheers.

jcranmer · on Oct 31, 2019

Reading Appendix A of the x86 instruction manual, which lists the opcode maps of the x86 instruction set. Section A.4 in particular gives strong insight into the answer to your question--several opcodes are represented by varying the register bits in the Mod/RM byte.

For example, opcode 0f01 is actually several opcodes depending on the Mod/RM byte. If the Mod/RM byte indicates a memory operand, then it's a SGDT, SIDT, LGDT, LIDT, LMSW, or INVLPG instruction (depending on what the first register number is). If it's a register-register form, it can be any one of 17 other instructions depending on the pair of registers.

woodruffw · on Oct 31, 2019

Overloading and variable-length instructions are the two big reasons that I can think of, off the top of my head. That's pushing the limits of my understanding of how decoding and microcode generation work on-silicon, though.

spc476 · on Oct 31, 2019

A very simple example of this is the accumulator form of XCHG (exchange values). The accumulator form (AX/EAX/RAX is the accumulator) is encoded as:

    10010rrr

    rrr = register
    000 = AX/EAX/RAX
    001 = CX/ECX/RCX
    010 = DX/EDX/RDX
    011 = BX/EBX/RBX
    100 = SP/ESP/RSP
    101 = BP/EBP/RBP
    110 = SI/ESI/RSI
    111 = DI/EDI/RDI

More inportantly, the XCHG instruction does NOT affect the flags. So the upshot is that the instruction `XCHG accumulator,accumulator` is effectively a no-operation, and yes, Intel does document the encoding for NOP as being

10010000

or `XCHG accumulator,accumulator`. Early chips in the x86 line (8086, 80286) actually did the physical exchange, but as the architecture grew with OoO operations, the chips now detect 0x90 as a NOP instruction and deal with it differently (most likely to prevent pipeline stalls waiting for the results of `XCHG accumulator,accumulator').

souprock · on Nov 1, 2019

It gets worse with x86_64 though. Normally, an instruction that isn't 64-bit will clear the upper 32 bits of a 64-bit register. The NOP is special, because it shouldn't do anything. All the other 32-bit XCHG instructions still clear the upper 32 bits of a 64-bit register.

So the NOP really is different, and a normal XCHG EAX,EAX is not possible with the 0x90 encoding. You can get a normal XCHG EAX,EAX via the 2-byte ModRM form of the instruction.