What about compared to RISC-V?

Findecanor · on Nov 15, 2022

My impression is that RISC-V is more designed for hardware people to pick extensions to create minimal cores only for their specific use cases, whereas a standard ARM core was designed for software people to have more features built-in from the start.

Aarch64 has many op-codes that take different parameters to become different effective instructions, often in cases where RISC-V would need an extension for one of their counterparts. I think a RISC-V µarch could do the same internally but that would probably require a larger decoding stage.

snvzz · on Nov 15, 2022

>but that would probably require a larger decoding stage.

The evidence so far points to the opposite.

Existing RISC-V µarch from e.g. Andes and SiFive offer performance that matches or beats the ARM cores they're positioned against, with considerably lower power and significantly smaller area.

And they do already have an answer for most of ARM's own lineup, covering from the smallest cores to the higher performing ones, solely excluding the very newest, highest performance targeting cores, where the gap is already under 2 years.

pjmlp · on Nov 15, 2022

I guess it depends on how many extensions mainstream versions of it end up using.

saagarjha · on Nov 15, 2022

Leans towards cleaner in my experience.

als0 · on Nov 15, 2022

Not if you add the extensions.

snvzz · on Nov 15, 2022

More pragmatic _and_ RISC-er.

gpderetta · on Nov 15, 2022

Is it more pragmatic? Their stance is basically 'instruction fusion will fix everything'.

chrisseaton · on Nov 15, 2022

This seems a reasonable and realistic stance though - instruction fusion is not theoretical and does scale pretty well in practice.

gpderetta · on Nov 15, 2022

We will see. I'm sure they can in principle make it work. Intel has shown that you can go a long way with increasingly complex decoders.

snvzz · on Nov 15, 2022

Fusion is entirely optional in RISC-V, thus decoders do not need to implement it.

It does not help the largest implementations, that favor having a lot of small ops flying, nor the smallest ones, where fusion is unnecessary complexity.

But it might make sense somewhere in the middle.

Ultimately, it does not harm the ISA to be designed with awareness it exists.

jabl · on Nov 15, 2022

> It does not help the largest implementations, that favor having a lot of small ops flying

There's lots of O(N^2) structures on a OoO chip to support in-flight operations, so if you can fuse ops (or have an ISA that provides common 'compound' operations in single instructions) that can be a big benefit.

For RISC-V the biggest benefit is probably to fuse a small shift with a load, since that is a really common use case (e.g. array indexing), and adding a small shifter to the memory pipe is very cheap. Alternatively, I think with RVA22 the bitmanip extension is part of the profile and IIRC it contains an add with shift instruction. So maybe compilers targeting RVA22 will instead start to use that one instead of assuming fusing will occur?

snvzz · on Nov 16, 2022

>Alternatively, I think with RVA22 the bitmanip extension is part of the profile and IIRC it contains an add with shift instruction. So maybe compilers targeting RVA22 will instead start to use that one instead of assuming fusing will occur?

B extension helps code density considerably. This is why, if the target has B, compilers will favor the shorter forms.

jabl · on Nov 16, 2022

I was thinking specifically of the 'add with shift' instructions in the Zba extension (sh[123]add[.w] they seems to be called) vs. using separate shift + add instructions from the C extension and hope the CPU frontend will fuse them together. Code size would be the same, and assuming the CPU fuses them they should be pretty close performance-wise as well.

zozbot234 · on Nov 15, 2022

Instruction fusion can fix some things, but RISC-V also allows for both compressed and extended-length instructions. In general, they've been very careful about not wasting any of their limited encoding space.

snvzz · on Nov 15, 2022

>Their stance

Citation needed.

>is basically 'instruction fusion will fix everything'.

Wherever I've seen people involved with RISC-V talk about fusion[0], the impression I got is the opposite; it is a minor, completely optional gimmick to them.

0. https://news.ycombinator.com/item?id=32614034

jabl · on Nov 15, 2022

https://arxiv.org/abs/1607.02318 is a widely cited paper that argues that simple and orthogonal ISA's are preferable, as fusion can fix up those common cases like more complicated addressing modes that other ISA's have in the base ISA.

GoOnThenDoTell · on Nov 15, 2022

Does it fix everything?

less_less · on Nov 15, 2022

No, of course. It'd be interesting to see how much you can claw back with a practical implementation though.

Off the top of my head, it's not practical to fix RISCV's multi-precision arithmetic issues with instruction fusion. That's not a dominant use-case, especially if post-quantum crypto takes off, but it's definitely a place where RISCV underperforms ARM and x86 by a large factor instruction-for-instruction.

Also there are places where ARM's instructions improve code density, like LDMIA, STMDB etc. These of course can't be fixed by fusion.

I'm sure there are other areas.

snvzz · on Nov 15, 2022

>code density

RISC-V excels in code density.

64-bit RISC-V is the most dense 64bit ISA by a large margin and has been for a long time.

32-bit RISC-V is the most dense 32bit ISA as of the recent B and Zc work; it used to be Thumb2 was ahead.

All of this is without compromising "purity" or needlessly complicating decode; there already is a 8-wide decode implementation (Ascalon, by Jim Keller's team), matching Apple M1/M2 decode width.

less_less · on Nov 15, 2022

Sure, that's fair. And I'm not saying RISC-V made the wrong choice. Just that lack of "impure" instructions like LDMIA does reduce code density, and can't be fixed through fusion, even if you can get ahead of Thumb2 with improvements elsewhere.

(Thumb2 is the most apt comparison here I think, since the instructions I mentioned are Thumb2 instructions.)

gpderetta · on Nov 15, 2022

Sure, given a Sufficiently Smart Decoder.