> > Also, I like how returning 0 is "xor eax, eax".
> -O1 is `mov eax, 0`
Simply because it is shorter: On x86-64 (and x86-32)
xor eax,eax
encodes as
31h C0h or 33h C0h
(depending on the assembler; typically the first one is used) - 2 bytes, while
mov eax,0x0
encodes as
B8h 00h 00h 00h 00h
- 5 bytes.
Having privately analyzed some 256b demos I cannot even imagine how one could even come to the idea to use `mov r32, imm32` for zeroing a register (except for the reason that people don't want to understand how the assembly code is internally encoded) - the canonical way to use is `xor` (`sub` also works in principle, but `xor` is the way that is recommended by Intel).
It's not just shorter, it's also faster. But see my answer also: there are condition flag implications of using XOR and sometimes MOV will be preferable. The optimiser will always know best :)
"On Sandybridge this gets even better. The register renamer detects certain instructions (xor reg, reg and sub reg, reg and various others) that always zero a register. In addition to realizing that these instructions do not really have data dependencies, the register renamer also knows how to execute these instructions – it can zero the registers itself. It doesn’t even bother sending the instructions to the execution engine, meaning that these instructions use zero execution resources, and have zero latency! See section 2.1.3.1 of Intel’s optimization manual where it talks about dependency breaking idioms. It turns out that the only thing faster than executing an instruction is not executing it."
It's fascinating how far down the rabbit hole goes these days. One might think machine code as emitted by compilers would be pretty close to where the buck stops, but no. Named registers are just an abstraction on top of a larger register pool, opcodes get JIT compiled and optimized to microcode instructions, execution order is mostly just a hint for the processor to ignore if it can get things done faster by reordering or parallelizing... And memory access is probably the greatest illusion of all.
What I also find rather interesting is the concept of macro-op fusion that Intel introduced with the Core 2 processors: This means for example that a cmp ... (or test ...) followed by a conditional jump can/will be fused together to a single micro-op. In other words: Suddenly a sequence of two instruction maps to one internal micro-op. If you are interested in the details, read section 8.5 in http://www.agner.org/optimize/microarchitecture.pdf
> -O1 is `mov eax, 0`
Simply because it is shorter: On x86-64 (and x86-32)
encodes as (depending on the assembler; typically the first one is used) - 2 bytes, while encodes as - 5 bytes.Having privately analyzed some 256b demos I cannot even imagine how one could even come to the idea to use `mov r32, imm32` for zeroing a register (except for the reason that people don't want to understand how the assembly code is internally encoded) - the canonical way to use is `xor` (`sub` also works in principle, but `xor` is the way that is recommended by Intel).
EDIT: Here is an article about that topic: https://randomascii.wordpress.com/2012/12/29/the-surprising-...