The code keeps using high registers for no discernible reason, instead of sticking with EAX--EDI; this means that each instruction is 1 byte longer than it could be.
This has consequences to the instruction fetching and decoding circuitry, where (in the Core 2) you can only read 16 instruction bytes per cycle (or 6 instructions). It is possible that the extraneous MOV instructions are just resulting in a better instruction alignment.
The code keeps using high registers for no discernible reason
The function[1] inside of which the assembly is located declares a lot of variables as well. I don't know how well Free Pascal does register allocation, but perhaps this avoid clobbering the registers it prefers. Alignment seems like a likely candidate, but isn't likely to explain why the exact ordering of the extra ops makes a difference.
Would help to see the assembly for the whole function.
This has consequences to the instruction fetching and decoding circuitry, where (in the Core 2) you can only read 16 instruction bytes per cycle (or 6 instructions). It is possible that the extraneous MOV instructions are just resulting in a better instruction alignment.