Jim Keller's team has a working 8-decode 10-issue RISC-V cpu in the labs. Anothe...

brucehoult · on Sept 19, 2022

I'm not sure it makes sense to describe (or implement) a RISC-V instruction decoder as "4-wide" or "8-wide".

It makes more sense to talk about how many bytes of code are decoded per clock cycle.

If it's 16 bytes (like current x86) then you get between 4 and 8 instructions depending on the percentage of "C" extension code. If it's 32 bytes (like M1) then you get between 8 and 16 instructions, with a typical average of 11 point something. Or, depending on decoder design, you might always get 16 instructions, but some of them are NOPs. That can be easier to deal with as later stages will be doing things such as turning MOV (and other things) into NOP and dropping all the NOPs (and also NOPs used for branch target alignment) before they reach the back end.

It's a little harder to decode dual-length RISC-V code than fixed length Aarch64 code, but I've described designs here and Reddit (and elsewhere) that show at 16 or 32 byte wide decode it's not enough harder to matter. Going much wider on decode usually isn't useful because you start to very often get a taken branch somewhere in that much code.