Early ARM was more purely RISC and strove for simplicity. No modern chip that's actually a serious contender in the marketplace can be cleanly classified as pure RISC or pure CISC; that debate has been mostly set aside for causing too much philosophy to get in the way of real-world performance.
Similar in principle, different in practice. What the A57 (and most x86 processors) do is ditch the 1:1 mapping between the instruction set and what the functional units execute. A simple example: in x86, a MOV with a destination in memory is broke down into two micro-ops: one to calculate the store address, and another to put the store data on the internal memory bus. The two operations are executed by different functional units in the backend. In fact, all arithmetic instructions with memory operands are ultimately broken down into separate ALU and load-store micro ops.
Microcode also breaks down bigger instructions into internal ones, but the implementation is different. Usually, a micro odds instruction causes the CPU to execute from an internal ROM. This is usually subject to performance constraints. With micro operations, usually the decoder a generate the micro ops directly.
Microcode implements (typically complex) instructions in terms of simpler ones, and it is stored on the chip. The simpler instructions may or may not be a subset of the existing instructions of the CPU. Executing microcode may (and, I guess, does, certainly historically) even use the same instruction decoder that 'normal' instructions use. If so, you can see microcode as hardware that traps the CPU when it encounters an instruction that the hardware claims to handle, but doesn't combined with a very fast read-only instruction cache that emulates those instructions in software.
Some CPUs split complex (for some definition of complex) operations into simpler units called micro-instructions because having more simpler instructions make it possible to make better use of available hardware (a slightly flawed comparison is that of bin packing. It becomes way easier to get decent results there if you cut the packages into smaller parts)
ARM does have microcode (though early ones didn't).
Basically if you don't have microcode, you kinda tie your instruction set to your physical architecture, which makes it hard to progress.
For example, at the time of the Pentium 3, all instructions were 64 bit - execution units, buses, register renaming tech. SSE had 128 bit vector instructions, which would screw with this. For this reason, Intel split them into 2 64 bit micro-instructions.
ARM generally doesn't have patchable microcode, which might be the source of your confusion?
You are confusing the concept of uops (or micro-instructions) with microcode.
Microcode is when the CPU runs microcode from a microcode ROM to implement an instruction. In CISC days this was often all instructions (these were termed microcoded CPUs).
The same architecture might have cheap microcoded and high end "hardwired" implementations.
There's a conceptual similarity between uops and microcode, to quote the WP microcode article: "Modern CISC/RISC implementations, e.g. x86 designs, decode instructions into dynamically buffered micro-operations with instruction encodings similar to traditional fixed microcode. Ordinary static microcode is used as hardware assistance for complex multistep operations such as auto-repeating instructions and for transcendental functions in the floating point unit; it is also used for special purpose instructions (such as CPUID) and internal control and configuration purposes."
>> Instructions are first fetched, then decoded into internal micro-operations
Is 'micro-operation' the same as microcode? I'm not familiar with the ARM Cortex-A architecture but I always thought ARM didn't have microcodes?