That doesn't make any sense. The ROB is after instructions have been cracked int...

usefulcat · on Jan 19, 2021

Maybe it wasn't clear, but the article I linked is saying that compared to M1, x86 architectures are decode-limited, because parallel decoding with variable-length instructions is tricky. Intel and AMD (again according to the linked article) have at most 4 decoders, while M1 has 8.

So yes the ROB is after decoding, but surely there's little point in having the ROB be larger than can be kept relatively full by the decoders.

jabl · on Jan 20, 2021

Well intentioned as that article may be, it makes plenty of mistakes. For a rather glaring one, no, uops are not linked to OoO.

Secondly, it ignores the existence of uop caches that x86 designs use in order to not need such wide decoders. Some ARM designs also use uop caches, FWIW, since it can be more power efficient.

That doesn't mean that fixed width decoding like on aarch64 isn't an advantage; it certainly is. Also, M1 is certainly a very impressive design, though of course it also helps that it's fabbed on the latest and greatest process.