Wow, this a wonderfully rich post! I had a question about the following statement:
>"In short, it’s a mess, with each generation adding and removing functionality, reusing or overloading instructions and instruction prefixes, and introducing increasingly complicated switching mechanisms between supported modes and privilege boundaries."
Can someone elaborate on how a instruction at the machine level can be "overloaded"? At this machine level how can an instruction be mapped to more than one entry in the microcode table? Or does this mean overloading in the regular programming sense of something like an ADD instruction capable of working with ints, strings etc.
> Can someone elaborate on how a instruction at the machine level can be "overloaded"? At this machine level how can an instruction be mapped to more than one entry in the microcode table?
Yep! Instruction overloading can occur in a few different senses:
1. As different valid permutations of operands and prefixes, e.g. `mov`
2. As having totally different functionalities in different privilege or architecture modes
3. As being repurposed entirely (e.g., inc/dec r32 are now REX prefixes)
Instruction-to-microcode translation is, unfortunately, not as simple as a (single) table lookup on x86_64 ;)
Reading Appendix A of the x86 instruction manual, which lists the opcode maps of the x86 instruction set. Section A.4 in particular gives strong insight into the answer to your question--several opcodes are represented by varying the register bits in the Mod/RM byte.
For example, opcode 0f01 is actually several opcodes depending on the Mod/RM byte. If the Mod/RM byte indicates a memory operand, then it's a SGDT, SIDT, LGDT, LIDT, LMSW, or INVLPG instruction (depending on what the first register number is). If it's a register-register form, it can be any one of 17 other instructions depending on the pair of registers.
Overloading and variable-length instructions are the two big reasons that I can think of, off the top of my head. That's pushing the limits of my understanding of how decoding and microcode generation work on-silicon, though.
More inportantly, the XCHG instruction does NOT affect the flags. So the upshot is that the instruction `XCHG accumulator,accumulator` is effectively a no-operation, and yes, Intel does document the encoding for NOP as being
10010000
or `XCHG accumulator,accumulator`. Early chips in the x86 line (8086, 80286) actually did the physical exchange, but as the architecture grew with OoO operations, the chips now detect 0x90 as a NOP instruction and deal with it differently (most likely to prevent pipeline stalls waiting for the results of `XCHG accumulator,accumulator').
It gets worse with x86_64 though. Normally, an instruction that isn't 64-bit will clear the upper 32 bits of a 64-bit register. The NOP is special, because it shouldn't do anything. All the other 32-bit XCHG instructions still clear the upper 32 bits of a 64-bit register.
So the NOP really is different, and a normal XCHG EAX,EAX is not possible with the 0x90 encoding. You can get a normal XCHG EAX,EAX via the 2-byte ModRM form of the instruction.
>"In short, it’s a mess, with each generation adding and removing functionality, reusing or overloading instructions and instruction prefixes, and introducing increasingly complicated switching mechanisms between supported modes and privilege boundaries."
Can someone elaborate on how a instruction at the machine level can be "overloaded"? At this machine level how can an instruction be mapped to more than one entry in the microcode table? Or does this mean overloading in the regular programming sense of something like an ADD instruction capable of working with ints, strings etc.