Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What about compared to RISC-V?


My impression is that RISC-V is more designed for hardware people to pick extensions to create minimal cores only for their specific use cases, whereas a standard ARM core was designed for software people to have more features built-in from the start.

Aarch64 has many op-codes that take different parameters to become different effective instructions, often in cases where RISC-V would need an extension for one of their counterparts. I think a RISC-V µarch could do the same internally but that would probably require a larger decoding stage.


>but that would probably require a larger decoding stage.

The evidence so far points to the opposite.

Existing RISC-V µarch from e.g. Andes and SiFive offer performance that matches or beats the ARM cores they're positioned against, with considerably lower power and significantly smaller area.

And they do already have an answer for most of ARM's own lineup, covering from the smallest cores to the higher performing ones, solely excluding the very newest, highest performance targeting cores, where the gap is already under 2 years.


I guess it depends on how many extensions mainstream versions of it end up using.


Leans towards cleaner in my experience.


Not if you add the extensions.


More pragmatic _and_ RISC-er.


Is it more pragmatic? Their stance is basically 'instruction fusion will fix everything'.


This seems a reasonable and realistic stance though - instruction fusion is not theoretical and does scale pretty well in practice.


We will see. I'm sure they can in principle make it work. Intel has shown that you can go a long way with increasingly complex decoders.


Fusion is entirely optional in RISC-V, thus decoders do not need to implement it.

It does not help the largest implementations, that favor having a lot of small ops flying, nor the smallest ones, where fusion is unnecessary complexity.

But it might make sense somewhere in the middle.

Ultimately, it does not harm the ISA to be designed with awareness it exists.


> It does not help the largest implementations, that favor having a lot of small ops flying

There's lots of O(N^2) structures on a OoO chip to support in-flight operations, so if you can fuse ops (or have an ISA that provides common 'compound' operations in single instructions) that can be a big benefit.

For RISC-V the biggest benefit is probably to fuse a small shift with a load, since that is a really common use case (e.g. array indexing), and adding a small shifter to the memory pipe is very cheap. Alternatively, I think with RVA22 the bitmanip extension is part of the profile and IIRC it contains an add with shift instruction. So maybe compilers targeting RVA22 will instead start to use that one instead of assuming fusing will occur?


>Alternatively, I think with RVA22 the bitmanip extension is part of the profile and IIRC it contains an add with shift instruction. So maybe compilers targeting RVA22 will instead start to use that one instead of assuming fusing will occur?

B extension helps code density considerably. This is why, if the target has B, compilers will favor the shorter forms.


I was thinking specifically of the 'add with shift' instructions in the Zba extension (sh[123]add[.w] they seems to be called) vs. using separate shift + add instructions from the C extension and hope the CPU frontend will fuse them together. Code size would be the same, and assuming the CPU fuses them they should be pretty close performance-wise as well.


Instruction fusion can fix some things, but RISC-V also allows for both compressed and extended-length instructions. In general, they've been very careful about not wasting any of their limited encoding space.


>Their stance

Citation needed.

>is basically 'instruction fusion will fix everything'.

Wherever I've seen people involved with RISC-V talk about fusion[0], the impression I got is the opposite; it is a minor, completely optional gimmick to them.

0. https://news.ycombinator.com/item?id=32614034


https://arxiv.org/abs/1607.02318 is a widely cited paper that argues that simple and orthogonal ISA's are preferable, as fusion can fix up those common cases like more complicated addressing modes that other ISA's have in the base ISA.


Does it fix everything?


No, of course. It'd be interesting to see how much you can claw back with a practical implementation though.

Off the top of my head, it's not practical to fix RISCV's multi-precision arithmetic issues with instruction fusion. That's not a dominant use-case, especially if post-quantum crypto takes off, but it's definitely a place where RISCV underperforms ARM and x86 by a large factor instruction-for-instruction.

Also there are places where ARM's instructions improve code density, like LDMIA, STMDB etc. These of course can't be fixed by fusion.

I'm sure there are other areas.


>code density

RISC-V excels in code density.

64-bit RISC-V is the most dense 64bit ISA by a large margin and has been for a long time.

32-bit RISC-V is the most dense 32bit ISA as of the recent B and Zc work; it used to be Thumb2 was ahead.

All of this is without compromising "purity" or needlessly complicating decode; there already is a 8-wide decode implementation (Ascalon, by Jim Keller's team), matching Apple M1/M2 decode width.


Sure, that's fair. And I'm not saying RISC-V made the wrong choice. Just that lack of "impure" instructions like LDMIA does reduce code density, and can't be fixed through fusion, even if you can get ahead of Thumb2 with improvements elsewhere.

(Thumb2 is the most apt comparison here I think, since the instructions I mentioned are Thumb2 instructions.)


Sure, given a Sufficiently Smart Decoder.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: