Easy to parse instructions? I mean Thumb-2 is very much a variable-length encodi...

Dylan16807 · on June 21, 2020

Variable length is okay as long as it's easy to calculate the length.

x86 makes it really awkward to pull out several instructions at a time for parallel decoding.

api · on June 21, 2020

The problem is that this calculation takes more cycles, and you do not know where the next instruction starts until it completes. It serializes what should be a parallel process. X64 chips use crazy hacks like caches and tables to work around this, but these add more transistors and power consumption.

hajile · on June 23, 2020

In processors, I-cache and decode consume a disproportionate amount of power relative to their size. I'd also note that all the latest high-performance ARM chips include an instruction decode cache because the power cost of the cache plus the lookup is lower (and much faster) than a full decode cycle. Of course, there are diminishing returns with cache size where it becomes all about bypassing part of the pipeline to improve performance despite being less power efficient.

x86 instruction size ranges from 8 bits to 120 bits (1-15 bytes). Since common instructions often fit in just 8 or 16 bits, there are power savings to be had due to smaller I-cache size per instruction. That comes at a severe decode cost though as every single byte must be checked to see if it terminates the instruction. After the length is determined, then it must decide how many micro-ops the instruction really translates into so they can be handed off to the scheduler.

ARM breaks down into v7 and v8. The original thumb instructions were slow, but saved I-cache. Thumb 2 was faster with some I-cache savings, but basically required 3 different decodes. ARMv8 in 64-bit mode has NO 16-bit instructions. This reduces the decoder overhead, but obliterates the potential I-cache savings. No doubt that this is the reason their I-cache size doubled.

RISC-V is not being discussed here, but is the most interesting IMO. The top 2 bits in an instruction tag it as 32-bit or 16-bit (there are reserved bit schemes to allow longer instructions, but I don't believe those are implemented or are likely to be implemented any time soon). This fixed bit pattern means that length is statically analyzable. 3 of the 4 patterns are reserved for 16-bit use which reduces the instruction size penalty (effectively making them 15-bit instructions). The result is something 3-5% less dense than thumb 2, but around 15% MORE dense than x86 all without the huge decode penalties of x86 or the multi-modes and mode-switching of thumb. In addition, the effect of adding RVC instructions reduces I-cache misses almost as much as doubling the I-cache size which is another huge power consumption win while not having a negative impact on overall performance either (in fact, performance should usually increase for the same I-cache size).