Fusion is entirely optional in RISC-V, thus decoders do not need to implement it.
It does not help the largest implementations, that favor having a lot of small ops flying, nor the smallest ones, where fusion is unnecessary complexity.
But it might make sense somewhere in the middle.
Ultimately, it does not harm the ISA to be designed with awareness it exists.
> It does not help the largest implementations, that favor having a lot of small ops flying
There's lots of O(N^2) structures on a OoO chip to support in-flight operations, so if you can fuse ops (or have an ISA that provides common 'compound' operations in single instructions) that can be a big benefit.
For RISC-V the biggest benefit is probably to fuse a small shift with a load, since that is a really common use case (e.g. array indexing), and adding a small shifter to the memory pipe is very cheap. Alternatively, I think with RVA22 the bitmanip extension is part of the profile and IIRC it contains an add with shift instruction. So maybe compilers targeting RVA22 will instead start to use that one instead of assuming fusing will occur?
>Alternatively, I think with RVA22 the bitmanip extension is part of the profile and IIRC it contains an add with shift instruction. So maybe compilers targeting RVA22 will instead start to use that one instead of assuming fusing will occur?
B extension helps code density considerably. This is why, if the target has B, compilers will favor the shorter forms.
I was thinking specifically of the 'add with shift' instructions in the Zba extension (sh[123]add[.w] they seems to be called) vs. using separate shift + add instructions from the C extension and hope the CPU frontend will fuse them together. Code size would be the same, and assuming the CPU fuses them they should be pretty close performance-wise as well.
Instruction fusion can fix some things, but RISC-V also allows for both compressed and extended-length instructions. In general, they've been very careful about not wasting any of their limited encoding space.
>is basically 'instruction fusion will fix everything'.
Wherever I've seen people involved with RISC-V talk about fusion[0], the impression I got is the opposite; it is a minor, completely optional gimmick to them.
https://arxiv.org/abs/1607.02318 is a widely cited paper that argues that simple and orthogonal ISA's are preferable, as fusion can fix up those common cases like more complicated addressing modes that other ISA's have in the base ISA.
No, of course. It'd be interesting to see how much you can claw back with a practical implementation though.
Off the top of my head, it's not practical to fix RISCV's multi-precision arithmetic issues with instruction fusion. That's not a dominant use-case, especially if post-quantum crypto takes off, but it's definitely a place where RISCV underperforms ARM and x86 by a large factor instruction-for-instruction.
Also there are places where ARM's instructions improve code density, like LDMIA, STMDB etc. These of course can't be fixed by fusion.
64-bit RISC-V is the most dense 64bit ISA by a large margin and has been for a long time.
32-bit RISC-V is the most dense 32bit ISA as of the recent B and Zc work; it used to be Thumb2 was ahead.
All of this is without compromising "purity" or needlessly complicating decode; there already is a 8-wide decode implementation (Ascalon, by Jim Keller's team), matching Apple M1/M2 decode width.
Sure, that's fair. And I'm not saying RISC-V made the wrong choice. Just that lack of "impure" instructions like LDMIA does reduce code density, and can't be fixed through fusion, even if you can get ahead of Thumb2 with improvements elsewhere.
(Thumb2 is the most apt comparison here I think, since the instructions I mentioned are Thumb2 instructions.)