Is it more pragmatic? Their stance is basically 'instruction fusion will fix eve...

chrisseaton · on Nov 15, 2022

This seems a reasonable and realistic stance though - instruction fusion is not theoretical and does scale pretty well in practice.

gpderetta · on Nov 15, 2022

We will see. I'm sure they can in principle make it work. Intel has shown that you can go a long way with increasingly complex decoders.

snvzz · on Nov 15, 2022

Fusion is entirely optional in RISC-V, thus decoders do not need to implement it.

It does not help the largest implementations, that favor having a lot of small ops flying, nor the smallest ones, where fusion is unnecessary complexity.

But it might make sense somewhere in the middle.

Ultimately, it does not harm the ISA to be designed with awareness it exists.

jabl · on Nov 15, 2022

> It does not help the largest implementations, that favor having a lot of small ops flying

There's lots of O(N^2) structures on a OoO chip to support in-flight operations, so if you can fuse ops (or have an ISA that provides common 'compound' operations in single instructions) that can be a big benefit.

For RISC-V the biggest benefit is probably to fuse a small shift with a load, since that is a really common use case (e.g. array indexing), and adding a small shifter to the memory pipe is very cheap. Alternatively, I think with RVA22 the bitmanip extension is part of the profile and IIRC it contains an add with shift instruction. So maybe compilers targeting RVA22 will instead start to use that one instead of assuming fusing will occur?

snvzz · on Nov 16, 2022

>Alternatively, I think with RVA22 the bitmanip extension is part of the profile and IIRC it contains an add with shift instruction. So maybe compilers targeting RVA22 will instead start to use that one instead of assuming fusing will occur?

B extension helps code density considerably. This is why, if the target has B, compilers will favor the shorter forms.

jabl · on Nov 16, 2022

I was thinking specifically of the 'add with shift' instructions in the Zba extension (sh[123]add[.w] they seems to be called) vs. using separate shift + add instructions from the C extension and hope the CPU frontend will fuse them together. Code size would be the same, and assuming the CPU fuses them they should be pretty close performance-wise as well.

zozbot234 · on Nov 15, 2022

Instruction fusion can fix some things, but RISC-V also allows for both compressed and extended-length instructions. In general, they've been very careful about not wasting any of their limited encoding space.

snvzz · on Nov 15, 2022

>Their stance

Citation needed.

>is basically 'instruction fusion will fix everything'.

Wherever I've seen people involved with RISC-V talk about fusion[0], the impression I got is the opposite; it is a minor, completely optional gimmick to them.

0. https://news.ycombinator.com/item?id=32614034

jabl · on Nov 15, 2022

https://arxiv.org/abs/1607.02318 is a widely cited paper that argues that simple and orthogonal ISA's are preferable, as fusion can fix up those common cases like more complicated addressing modes that other ISA's have in the base ISA.

GoOnThenDoTell · on Nov 15, 2022

Does it fix everything?

less_less · on Nov 15, 2022

No, of course. It'd be interesting to see how much you can claw back with a practical implementation though.

Off the top of my head, it's not practical to fix RISCV's multi-precision arithmetic issues with instruction fusion. That's not a dominant use-case, especially if post-quantum crypto takes off, but it's definitely a place where RISCV underperforms ARM and x86 by a large factor instruction-for-instruction.

Also there are places where ARM's instructions improve code density, like LDMIA, STMDB etc. These of course can't be fixed by fusion.

I'm sure there are other areas.

snvzz · on Nov 15, 2022

>code density

RISC-V excels in code density.

64-bit RISC-V is the most dense 64bit ISA by a large margin and has been for a long time.

32-bit RISC-V is the most dense 32bit ISA as of the recent B and Zc work; it used to be Thumb2 was ahead.

All of this is without compromising "purity" or needlessly complicating decode; there already is a 8-wide decode implementation (Ascalon, by Jim Keller's team), matching Apple M1/M2 decode width.

less_less · on Nov 15, 2022

Sure, that's fair. And I'm not saying RISC-V made the wrong choice. Just that lack of "impure" instructions like LDMIA does reduce code density, and can't be fixed through fusion, even if you can get ahead of Thumb2 with improvements elsewhere.

(Thumb2 is the most apt comparison here I think, since the instructions I mentioned are Thumb2 instructions.)

gpderetta · on Nov 15, 2022

Sure, given a Sufficiently Smart Decoder.