Is there a performance advantage to using a microcoded instruction vice explicit...

rep_lodsb · on April 4, 2023

A non-repeated string opcode is one byte for the CPU to fetch vs. several for the corresponding "RISC-like" series of operations (load, increment SI, store/compare, increment DI).

With the REP prefix (another single byte), an entire loop could run in microcode without any additional instruction fetches. Remember that each memory access took 4 clock cycles and there was no cache yet.

--

Eliminating opcode fetches might still speed things up today in some situations, but modern x86 cores are optimized to decode and dispatch several simple operations each cycle without having to go through a microcode ROM (and thus have to use a slower path for the complex instructions).

Also the fact that compilers didn't emit most of the more complex/specialized instructions led to Intel not spending much effort on optimizing those.

userbinator · on April 5, 2023

Also the fact that compilers didn't emit most of the more complex/specialized instructions led to Intel not spending much effort on optimizing those.

I believe some compilers will still emit rep movs for memcpy, rep stos for memset, rep cmps for memcmp, and sometimes even rep scas for memchr/strlen if given the appropriate options.

bell-cot · on April 4, 2023

Directly implementing many & complex instructions in silicon is fast, but takes a lot of transistors. Microcode running on a much simpler "actual" processor is slower, but requires far fewer transistors. IIR, ~all substantial CPU's of the past ~40 years have mixed the two approaches.

(Microcode also makes it possible to design in bug-patching features. Possibly including patches for bugs in your direct-implemented instructions.)