Is this really that crazy? A 64-bit immediate value takes 8 bytes. So of that 10 byte instruction, 8 bytes of it are the value to load into the register. Similarly, a 32-bit immediate value takes 4 bytes, so 4 bytes of the other instructions are the 32-bit immediate values. Taking this into account, we see that the non-immediate overhead is 1 byte for movl, 2 bytes for movq, and 2 bytes for movabsq.
I don't really think this is as crazy as the article is implying.
It's not crazy at all. This sort of shuffling is pretty standard for RISC ISAs, where it's typical to have fixed-length instructions that are no larger than the registers.
For example, MIPS has no actual "load immediate" instruction. Instead, say you want to load the value 0x12345678 into register t0. You can go ahead and tell your MIPS assembler:
li $t0, 0x12345678
but it will actually emit something like:
lui $t0, 0x1234
ori $t0, $t0, 0x5678
"lui" here is the "load upper immediate" instruction, which loads the target register with the 16-bit immediate, left-shifted by 16 bits. "ori $t0, t0, 0x5678" then performs a bitwise OR with its immediate to get the lower 16 bits into the register. This can be done in a single instruction if the immediate is 16 bits or smaller by using the dedicated zero register (a read-only register that always contains 0x00000000, which turns out to be extremely handy for minimizing an instruction set):
SPARC does something similar, with a special "sethi" instruction to set the high 22 bits of a register, using more space for an argument than any other instruction does.
The SPARC call instruction beats that with a 30 bit immediate, sacrificing a huge chunk of the encoding space for being able to reach any target in a 32 bit address space with a single instruction.
I think the author is saying what's crazy is the lack of 64-bit immediates in a machine that contains 64-bit registers. There's an instruction to add a 32-bit immediate (and I believe 8 and 16 as well), but not a 64-bit one.
It's rare for programs to contain integer constants that don't fit in 32 bits, and they aren't needed for 64-bit code/global memory references due to the introduction of addressing relative to the instruction pointer.
Also there is a 15 bytes instruction length limit that would have to be extended if 64-bit immediates were allowed on all instructions.
No processor out there has 64-bit arithmetic immediates - it's not useful enough in practice that you'd want it, and most other architectures have fixed-length 4-byte instructions anyway so it's not practical.
It's not crazy: it's just standard operating procedure for RISC-like architectures. In fact, anything that brings the x86 architecture closer to RISC is nice in my book.
Alpha had all sorts of exotic load intermediate and shift instructions to help with this. 64bit registers, 32bit instructions, I forget the exact number but you only had 20ish bits of immediate value.
It seemed like slightly more of a chore then, with x8664 you can use memory with a lot of instructions..
Populating x86-64 floating point registers is also an amusing subject.
The obvious instruction for loading a (64-bit) float into an xmm register is movsd. With a memory source operand, the higher part of the register is zeroed, which is what you want. No problem.
Now the fun part: if the source is not memory but another xmm register, the higher part of the register is not zeroed. This induces a false dependency on the previous value of the destination register that can cause performance issues. To avoid this problem, such register-register copies should be done with a packed move instruction. (Or vmovsd, but that was added much later.)
The obvious packed move instruction for 64-bit floats is movapd, but we can do better than that by using movaps - it is still a float domain instruction but is a byte smaller.
So the optimal way to move a single double from one register to another is to use a vector move of the wrong type.
> "it is impossible to add 2^33 to rax using one instruction only."
This can in fact be done, with a memory operand. I'm not sure about the performance compared to a 64-bit load immediate followed by an add, but this will do it (NASM syntax):
Thank you, I fixed that. In my head I knew a REX prefix was necessary and thought it was sufficient, but i386 has shortcuts to load an immediate into a register that are not available in 64 bits version!
I don't really think this is as crazy as the article is implying.