ARM Cortex-A57 Software Optimisation Guide [pdf]

gshrikant · on March 5, 2015

From page 6,

>> Instructions are first fetched, then decoded into internal micro-operations

Is 'micro-operation' the same as microcode? I'm not familiar with the ARM Cortex-A architecture but I always thought ARM didn't have microcodes?

wtallis · on March 5, 2015

Early ARM was more purely RISC and strove for simplicity. No modern chip that's actually a serious contender in the marketplace can be cleanly classified as pure RISC or pure CISC; that debate has been mostly set aside for causing too much philosophy to get in the way of real-world performance.

rayiner · on March 5, 2015

Similar in principle, different in practice. What the A57 (and most x86 processors) do is ditch the 1:1 mapping between the instruction set and what the functional units execute. A simple example: in x86, a MOV with a destination in memory is broke down into two micro-ops: one to calculate the store address, and another to put the store data on the internal memory bus. The two operations are executed by different functional units in the backend. In fact, all arithmetic instructions with memory operands are ultimately broken down into separate ALU and load-store micro ops.

Microcode also breaks down bigger instructions into internal ones, but the implementation is different. Usually, a micro odds instruction causes the CPU to execute from an internal ROM. This is usually subject to performance constraints. With micro operations, usually the decoder a generate the micro ops directly.

Someone · on March 5, 2015

Not quite.

Microcode implements (typically complex) instructions in terms of simpler ones, and it is stored on the chip. The simpler instructions may or may not be a subset of the existing instructions of the CPU. Executing microcode may (and, I guess, does, certainly historically) even use the same instruction decoder that 'normal' instructions use. If so, you can see microcode as hardware that traps the CPU when it encounters an instruction that the hardware claims to handle, but doesn't combined with a very fast read-only instruction cache that emulates those instructions in software.

Some CPUs split complex (for some definition of complex) operations into simpler units called micro-instructions because having more simpler instructions make it possible to make better use of available hardware (a slightly flawed comparison is that of bin packing. It becomes way easier to get decent results there if you cut the packages into smaller parts)

CHY872 · on March 5, 2015

ARM does have microcode (though early ones didn't).

Basically if you don't have microcode, you kinda tie your instruction set to your physical architecture, which makes it hard to progress.

For example, at the time of the Pentium 3, all instructions were 64 bit - execution units, buses, register renaming tech. SSE had 128 bit vector instructions, which would screw with this. For this reason, Intel split them into 2 64 bit micro-instructions.

ARM generally doesn't have patchable microcode, which might be the source of your confusion?

zurn · on March 5, 2015

You are confusing the concept of uops (or micro-instructions) with microcode.

Microcode is when the CPU runs microcode from a microcode ROM to implement an instruction. In CISC days this was often all instructions (these were termed microcoded CPUs). The same architecture might have cheap microcoded and high end "hardwired" implementations.

There's a conceptual similarity between uops and microcode, to quote the WP microcode article: "Modern CISC/RISC implementations, e.g. x86 designs, decode instructions into dynamically buffered micro-operations with instruction encodings similar to traditional fixed microcode. Ordinary static microcode is used as hardware assistance for complex multistep operations such as auto-repeating instructions and for transcendental functions in the floating point unit; it is also used for special purpose instructions (such as CPUID) and internal control and configuration purposes."

brigade · on March 5, 2015

Depends on what you mean by microcode. Pretty much any CPU that implements division or transcendental functions does so via microcode, for instance.

bhouston · on March 5, 2015

Similar table for x86 instructions:

https://gmplib.org/~tege/x86-timing.pdf