> > Also, I like how returning 0 is "xor eax, eax". > -O1 is `mov eax, 0` Simply...

3chelon · on Nov 28, 2016

It's not just shorter, it's also faster. But see my answer also: there are condition flag implications of using XOR and sometimes MOV will be preferable. The optimiser will always know best :)

wolfgke · on Nov 28, 2016

> there are condition flag implications of using XOR and sometimes MOV will be preferable

If the condition flags have to be preserved, you are right. But otherwise, read the linked article (https://randomascii.wordpress.com/2012/12/29/the-surprising-...):

"On Sandybridge this gets even better. The register renamer detects certain instructions (xor reg, reg and sub reg, reg and various others) that always zero a register. In addition to realizing that these instructions do not really have data dependencies, the register renamer also knows how to execute these instructions – it can zero the registers itself. It doesn’t even bother sending the instructions to the execution engine, meaning that these instructions use zero execution resources, and have zero latency! See section 2.1.3.1 of Intel’s optimization manual where it talks about dependency breaking idioms. It turns out that the only thing faster than executing an instruction is not executing it."

Sharlin · on Nov 28, 2016

It's fascinating how far down the rabbit hole goes these days. One might think machine code as emitted by compilers would be pretty close to where the buck stops, but no. Named registers are just an abstraction on top of a larger register pool, opcodes get JIT compiled and optimized to microcode instructions, execution order is mostly just a hint for the processor to ignore if it can get things done faster by reordering or parallelizing... And memory access is probably the greatest illusion of all.

wolfgke · on Nov 28, 2016

What I also find rather interesting is the concept of macro-op fusion that Intel introduced with the Core 2 processors: This means for example that a cmp ... (or test ...) followed by a conditional jump can/will be fused together to a single micro-op. In other words: Suddenly a sequence of two instruction maps to one internal micro-op. If you are interested in the details, read section 8.5 in http://www.agner.org/optimize/microarchitecture.pdf