Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://en.wikichip.org/wiki/amd/microarchitectures/zen_2

The actual instruction decoder is 4-wide. However, the micro-op cache has 8-wide issue, and the dispatch unit can issue 6 instructions per cycle (and can retire 8 per cycle to avoid ever being retire-bound). In practice, Zen 2 generally acts like a 6-wide machine.

Oh, on this terminology: x86 instructions are 1-15 bytes wide (averaging around 3-4 bytes in most code). n-wide decode refers to decoding n instructions at a time.



Thanks for the link! Yeah, that's basically the numbers I also found -- although the number of instructions decoded per clock cycle is a different metric from the number of µop that can be issued, so that feels a bit like moving the goal post.

But, fair enough, for practical applications the latter may matter more. For an apple-to-apple comparison (pun not intended) it'd be interesting to know what the corresponding number for the M1 is; while it is ARM and thus RISC, one might still expect that there can be more than one µop per instructions, at least in some cases?

Of course then we might also want to talk about how certain complex instructions on x86 can actually require more than one cycle to decode (at least that was the case for Zen 1) ;-). But I think those are not that common.

Ah well, this is just intellectual curiosity, at the end of the day most of us don't really care, we just want our computers to be as fast as possible ;-).


I have usually heard the top-line number as the issue width, not the decode width (so Zen 2 is a 6-wide issue machine). Most instructions run in loops, so the uop cache actually gives you full benefit on most instructions.

On the Apple chip: I believe the entire M1 decode path is 8-wide, including the dispatch unit, to get the performance it gets. ARM instructions are 4 bytes wide, and don't generally need the same type of micro-op splitting that x86 instructions need, so the frontend on the M1 is probably significantly simpler than the Zen 2 frontend.

Some of the more complex ops may have separate micro-ops, but I don't think they publish that. One thing to note is that ARM cores often do op fusion (x86 cores also do op fusion), but with a fixed issue width, there are very places where this would move the needle. The textbook example is fusing DIV and MOD into one two-input, two-output instruction (the x86 DIV instruction computes both, but not the ARM DIV instruction).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: