I’ve had a lot of fun learning RISC-V assembly off and on over the last year. I went through a colleague of mine’s tutorial series on implementing an operating system targeting RISC-V using Rust. Now I’m working on my own small assembler for RISC-V.
To me, a fun ISA is one which provides opportunities for golf, which basically means CISC ISAs with lots of different ways to do things. RISC-V is proudly, definitively not that and that's a good design decision, but you'll never be able to replace a stretch of straightforward code with a handful of more obscure opcodes and complicated addressing modes.
The base ISA is very plain and RISCy. But I'd encourage you to go read the Bitmanip extension and find an opcode in there that doesn't have at least one obscure alternate use! =)
As for addressing modes, yeah I do wish we had a little bit more as a software person (but not as a hobby/toy cpu writer!). The current stuff can encode a surprising number of simple loops efficiently if you do it right, but of course if you come from x86 you'll have to do more with less...
> The base ISA is very plain and RISCy. But I'd encourage you to go read the Bitmanip extension and find an opcode in there that doesn't have at least one obscure alternate use! =)
Complex addressing modes are un-RISCy because they're harder to pipeline: Memory accesses might result in page faults, and if you have complicated addressing modes, all of a sudden you have some nontrivial ALU state to back out before you can handle the fault. It's a lot simpler to have rock-dumb load/store opcodes and then do all ALU work in terms of register-register opcodes which don't touch memory.
John Mashey (helped design the MIPS architecture) has a great account of how complex addressing modes are difficult to work with partway down this page:
I can't link to it directly, but it starts with this text:
> General comment: this may sound weird, but in the long term, it might be easier to deal with a really complicated bunch of instruction formats, than with a complex set of addressing modes, because at least the former is more amenable to pre-decoding into a cache of decoded instructions that can be pipelined reasonably, whereas the pipeline on the latter can get very tricky (examples to follow). This can lead to the funny effect that a relatively "clean", orthogonal archiecture may actually be harder to make run fast than one that is less clean. Obviously, every weirdness has it's penalties.... But consider the fundamental difficulty of pipelining something like (on a VAX):
... and he then goes on a detailed step-by-step process of what has to happen to execute that opcode in the world of page faults and unaligned memory accesses, another thing classic RISC chips were very down on, but I think they've gotten a bit looser on them in recent decades.
well except for push/pop, and all the microcoded stuff.
IMHO the x86 has been successful because it started out semi-RISCy - no indirect addressing modes, no auto inc/dec, only one memory access per instruction - (except for push/pop and the microcoded segment stuff).
It means either an instruction goes to completion, or it faults, there's no state change to undo if something takes a TLB miss - compared with a 68k etc it's an easy implementation
They're a proposal only (and I don't think they're an official proposal), and having read them it seems the guy just shoehorned in anything he could think of. I'm curious to see how much of it will be accepted.
I can see why you'd say that yeah, it could probably use some trimming down of the less widely applicable instructions. But I don't otherwise think having a set of disparate, seemingly unrelated instructions is a bad thing for something like bitmanip, as long as they all pull their weight.
I think they do have an official working group. I remember reading a few months ago their working GCC PoC had ~4% dynaminc instruction reduction in SPECint and some 5% perf improvement. Nothing world-shattering, but considering that the instructions are pretty lightweight to implement in hardware, the cost-benefit seems decent.
(Full disclosure: I am not a RISC-V member or particularly invested in bitmanip, just some random person going off of what I remember!)
> but you'll never be able to replace a stretch of straightforward code with a handful of more obscure opcodes and complicated addressing modes.
It just changes the target for a (human or computer) superoptimizer. Instead of using weird CISCy instructions, you're optimizing things like register allocation and the binary 'interfaces' between parts of your code. At a slightly higher level than a single insn, assembly coding is always about having "lots of different ways to do things"!
Sort of. The fun optimizations are a little different now: Can you choose your registers so you can make maximum use of Compressed instructions? And for higher performance chips, can you select sequences of instructions that trigger macro-op fusion?
Yeah that’s a bit like saying you enjoy spelling bees, whereas a language that is idiomatic enough to allow for competitive spelling to exist as a real thing has fundamentally failed at its job.
Orthography is orthogonal to a language, as transliteration proves. Putonghua is Putonghua regardless of whether it's written in Sinitic characters, Yale Romanization, or Pinyin. Also, don't run down someone else's hobbies. It's uncouth.
I’m not..? It’s fine to play code golf with CISC-y architectures. But being able to do that is an anti-goal to a compiler-friendly architecture, as RISC tries to be.
I grew up with Z80, back in the day when business applications had enough Assembly into them. Used 68000 and all x86 variants until Pentium (by then it was too much effort to write Assembly by hand), also had to deal with MIPS, PowerPC, SPARC, ARM and various bytecode formats.
X86 remains my favourite, in tooling and high level opcodes with cool tricks.
Is there a std-variant for rv64g? And if not, how easy is it to setup for rv64gc? I have an emulator with a known rv64gc bug, and it's faster to emulate uncompressed instructions (until I start translating them, I guess).
Also, what language is the assembler written in? Is it embeddable?
Does anyone know how well RISC-V will be able to compete with ARM in the higher end? It seems pretty easy to come up with a processor design that is competitive in low-compute load applications, but what about more advanced designs? I'm curious what it is that ARM brings to the table there and if there is a chance that RISC-V will ever offer serious competition.
Certainly RISC-V does not need to be the best to survive as a viable option for most designs. It may absorb a lot of market share from ARM in the long run. It will be interesting to see how it develops.
RISC-V is a modular ISA, the base ISA contains only the minimum that must be included for all architectures.
The basic ISA therefore excludes multiplications, divisions, float, atomic, SIMD, etc.
The base ISA can only be used for small, minimalistic microcontrollers.
For more advanced microcontrollers, ISA extensions add Multiplications/divisions, Float32, ... (extensions M, F, ...).
For higher performance architectures you must add ISA extensions such as Atomics, SIMD/Vectors, ... (extensions A, P/V, ...).
(I intentionally exclude some details and other extensions to illustrate the benefits of the modular ISA)
Some of these extensions have a frozen spec, and some still open.
Therefore RISC-V does not yet offer a complete solution to compete with ARM or x86.
And currently RISC-V is perfect to target microcontrollers.
However RISC-V is designed to be competitive, as the instructions are designed to be compatible with existing optimizations for modern superscalar processors architectures.
So yes, there is a chance that RISC-V will become a serious competitor in the future.
Or fragmented like Linux distros, as it remains to be seen what extensions will be freely available, and certified as correctly implemented at the hardware level.
All extensions published by the RISC-V Foundation will be freely available, with verification tools.
Linux distributors are attempting to define a base spec (see the Unix Platform Spec group at the Foundation) which will define a minimum set of required extensions, and all other extensions will be optional and detected at runtime. We (Red Hat) are working with Debian, SUSE and Canonical on this.
It's possible in future that extensions will move from optional to required if they become popular and widely implemented. Apart from embedded (where, frankly, like ARM anything goes) Linux will not be fragmented.
I expect a future, where like Linux and BSD downstreamers, there will be OEMs with their own proprietary extensions using them as means of business differentiation.
Oh for sure I expect private extensions - in fact there's even a namespace for them in the RISC-V specs. A bigger problem is not private extensions, but companies adding ad hoc MSRs and instructions, and in fact one company has already done that. Most likely we'll ignore these and try to work with the companies to get them to do the right thing in future.
Note that Intel also provides private extensions to their best customers. (I believe they are fused off in chips sold to general customers.)
I don't really understand your comparison with linux distros.
Of course extensions will be fragmented, each chip will only embed a faction of the ISA according to its range of application.
But Linux distro are not made to be "modular".
Yes, only standard extensions will be supported by all compilers.
Why is this a problem?
If a company needs to design its own extension for its specific application, RISC-V allows it.
And it will never get upstream, because it is only relevant for this specific application.
And if this application is needed by many vendors, then it means that a standard extension must be designed.
Android is not fragmented, it is easy to write applications that run well on all android versions. The problem RISC-V faces is that one or more companies start to grow a significant market with non-compliant chips, basically the browser war scenario.
> The problem RISC-V faces is that one or more companies start to grow a significant market with non-compliant chips, basically the browser war scenario.
The CPU market today is even more closed to competition than web browsers.
RISC-V does not "face this problem", this problem already existed before RISC-V.
But RISC-V could make this market more competitive.
If there is a royalty free standard on top of which all modern CPUs are built (the RISC-V ISA), then a newcomer can easily use it and take advantage of the tools which go with it.
But companies remain free to diverge from this standard by adding their own non-standard extensions.
For advanced designs the instruction set really doesn't matter that much, the bulk of the silicon in a modern advanced CPU is spent doing things like caching, branch prediction, instruction scheduling, buffering loads and stores that may be unwound due to speculation, address translation, etc. These things are completely instruction set agnostic. To see how little it matters look how well x86 processors work despite the x86 instruction set being a mess (even figuring out how long an instruction is in bytes is a nightmare) or how well advanced ARMv7 processors despite that instruction set being a mess (a free shift on every immediate, urgh).
Now low-power is a completely different game. The bulk of the CPU is spent doing really quite basic stuff, fetching instructions, decoding instructions, simple ALU ops. Tiny differences to the instruction set can make a big difference to how much silicon (thus power) this takes. This is why RISC-V offers a 'reduced' variant (with only 16 registers), they know that for any low-power design the vast majority of silicon area is just going to be spent on flip-flops or latches for the 32 entry register file (what a waste given the tiny performance bump in moving from 16 to 32 registers). That's a lot of leakage power for no reason.
Now RISC-V is not necessarily a great instruction set, but it's not really an awful instruction set either (from a pure perspective of implementing an efficient 32-bit CPU, I'm sure it has academic advantages). For example nobody implements a RISC-V CPU without the 'C' extension (16-bit instructions) and given that they really should have just made all instructions 16-bit with optional 16-bit or 32-bit payloads (typically immediates). This would have meant it was only a single instruction to load a 32-bit constant rather than two and code density would be higher (thus less power spent fetching and decoding instructions). On the other hand you'd have a lot less extensibility available. Probably this would require 16 registers instead of 32 (not enough bits in the 16-bit instructions otherwise), but again this is a good thing for lower power in small CPUs.
So RISC-V will never be the best instruction set for low-power 32-bit CPUs, but it's definitely better than x86 and ARMv7, and that's probably enough. And for 'advanced' designs it just doesn't matter, it's all about the quality of the implementation.
Personally I dislike RISC-V but I will still use it for implementing CPUs (and already have) due to the toolchain support. Designing your own instruction set is fun but porting a compiler is not, and RISC-V is sufficiently 'not bad' that it does the job. I imagine a lot of people feel similarly.
I like RISC-V and I like RV32E, the 16-register variant which is still 'OPEN' and not yet ratified. I think RVC is a very elegant compressed instruction approach; it can address 8 registers and is part of RV32E. RVC is far better than the Thumb or MIPS16 modes.
Yeah, I like RISC-V generally, especially the base and the vector extension. However, they really went off the rails with the Bit Manipulation Extensions [1] still in draft form. Compare bfp which mixes control + data in the same register with ARMv8 bfi. It's only been added since the 0.92 draft but it seems to me very un-RISC-like.
I think you'll find that cutting out 16 registers wins you less area than it costs you in code ROM; and at a given execution speed target, you'll be costing yourself in power and area to make up for it.
There may be some kinds of software where this is not an issue, but for a lot of common types of software, 16 registers just keeps losing.
It’s not about the die area, but rather the instruction encoding space. And because you can freely mix and match long and short instructions, in practice it doesn’t matter much.
The "Shakti" open source processor from the Indian Institute of Technologies(IIT) is worth looking at for a comprehensive roadmap and vision based on RISC-V. Take a look at the available documents/software at https://shakti.org.in/
I'd say that depends on the higher end of what? Control over the firmware, understanding of the actual bits you're flipping and the architecture, power consumption, or raw computer per clock?
The ISA design has very little impact on actual energy consumption rather than an actual CPU's uArch. This is a conception that hasn't been true for at least 20 years.
It was somewhat true in the days where memory was expensive, and so smaller instructions were a good idea. But this came at the cost of increased area and thus energy (since you need more decoding logic).
As time went on, transistors shrank, so this cost kept decreasing; at the same time, memory got better and we developed techniques to hide the latency more, so the upside also kept decreasing.
Ultimately in modern times any difference in ISA results in some ~1% difference in energy consumption. You would gain far more efficiency (on the order of 2.5x) in just designing your uArch in a better way-- say, in-order vs OoO. [1]
In my opinion, the reason CISC vs. RISC is still very alive today is because ARM has done a fabulous job in marketing and executing itself as energy efficient. The big.Little architectures are an example in cementing this idea.
Conversely, Intel has done a similarly fabulous job in marketing and executing itself as performant, allowing both companies to live happily in their respective areas. The introduction of hyperthreading is another example of pushing this edge.
But these design decisions are NOT ISA intrinsic, and these misconceptions are changing incredibly fast (though I think ARM is doing a much better job than Intel at changing these paradigms).
> ARM has done a fabulous job in marketing and executing itself as energy efficient.
To be fair to ARM, they actually have produced energy efficient processors. I don't know a lot about ARM processor internals, but I've seen some presentations from many years ago and they had clever tricks for saving power while running, such as only charging some transistors during part of a clock cycle.
That doesn't mean the ISA is the reason for power efficiency, it means ARM designs have other tricks as well. Or at least, they used to.
I can tell you what's missing: a POPCOUNT instruction.
Yes, there is one in the proposed "B" extension. Not good enough: it won't be in the smaller chips, or even in the biggest ones for a good while.
But the article is interesting in noting all the stuff that we just don't see when we buy a gadget. The amount of sheer engineering effort that goes into real products that millions of people rely on is stunning. It takes a very large amount of actual money changing hands to generate the amount of invisible effort that is needed to support the things we use every day. The lack of that expenditure will make RISC-V adoption much slower than we might otherwise expect.
> Yes, there is one in the proposed "B" extension. Not good enough: it won't be in the smaller chips, or even in the biggest ones for a good while.
The lowRISC Ibex core which I work on and whilst not the smallest core is definitely at the small end has full bitmanip extension support: https://github.com/lowrisc/ibex
Yes the extension isn't finalised yet but we're not expecting much change so we can tweak the implementation to match the ratified standard when it appears.
I'd also note bitmanip is broken up into subgroups so it's possible smaller cores will only implement the simple stuff (e.g. Popcount) to avoid the overhead of the full implementation.
I'm not the one you're asking. But... there's many a slip between spec and chip.
That is, creating a spec that says that it includes a POPCOUNT instruction is easier (except perhaps for politics and bureaucracy) than actually implementing one in a chip, going through layout, producing masks, and starting production. Then it takes a while for those chips to work their way through the supply chain to general availability.
How long? I don't know. But I suspect that it's a non-trivial amount of time.
How long until that instruction is available in all chips that are manufactured? Longer.
Its 2020. We've known how RISC machine's evolve for 3 decades or so, yet RISC-V rewound the clock almost back to the beginning. "Let them eat extensions" is a recipe for fragmentation. MIPS actually experienced this in the field. ARM floating-point did for many years, too.
> "Let them eat extensions" is a recipe for fragmentation.
not neccessarily
in machine mode (think firmware or hypervisor level) you can trap on unrecognized instructions and handle them retroactively in software), so maybe fragmentation wont be (at least technically) a big issue....
I remember the awful days of fake hard-float support in ARMv4. Trapping to the kernel and doing soft float there was was 10x worse than just doing soft float in userspace in the first place.
Now, imagine doing that for xor-and-rotate, or bitfield insert/extract. Or leading zero count, or indexed memory access.
Exactly. Emulation would be even worse than omission. Putting in optimizations that make the slowest hardware run even slower is not the route to happy users.
In your last paragraph, you sound like you're agreeing with ncmncm (that it will be a while), but arguing that it doesn't matter. (Or are you arguing that we can't tell how to define "for a good while" because nobody is already using the chip for performance?)
I agree that the transition to RISC-V (if it's going to happen) will take time, I do not agree that it will take even longer for the B extension. That's why I disagree with him.
Manifestly, the B extension isn't in any of the chips being produced now, to my knowledge. It's a proposed extension, not finalized. So, until it is finalized, I don't expect it to be implemented in production. We don't know how long it will be until it is finalized, but the draft had not been updated in more than a year, when I last checked. For all we know, it may never become a standard extension. Somebody said they were trying to get the POPCOUNT instruction removed even from the extension, or made an optional part of it.
It's a large extension, so (like other large extensions) it won't be implemented on the smaller chips that will also lack, e.g., floating point or vector units, or 64-bit registers and instructions, and for the same reason.
I don't know why I would need sources. What seems surprising about the statement that something that is not considered finished is not being built into chips, or that something huge will not be included in small chips?
> Manifestly, the B extension isn't in any of the chips being produced now
The RISC-V chips produced today are not made for performance or be competitive, they are proof of concepts, toys and prototypes.
The current state of the RISC-V spec is not competitive with either ARM or x86. Therefore why use this as a comparison point?
> It's a proposed extension, not finalized.
Like many other extensions such as the Packed SIMD one, the Vector one and Hypervisor one.
These 3 are probably even more important than bit manipulation for performance and are not finalized either.
The RISC-V ISA is still under construction.
> It's a large extension, so (like other large extensions) it won't be implemented on the smaller chips that will also lack.
Did you know that it's stipulated in the RISC-V spec that instructions not present in the hardware can be emulated in software?
This makes it possible, in a small chip, to handle an extension as large as B without the need to implement all the instructions.
> For all we know, it may never become a standard extension. Somebody said they were trying to get the POPCOUNT instruction removed even from the extension, or made an optional part of it.
> I don't know why I would need sources.
You need a source because you make surprising assertions.
That OpenVDB data structure is a sparse hierarchical collection with a huge count of small 3D blocks of voxels.
They use bit masks in the blocks to track which voxels are empty, and which contain the payload data or higher resolution child blocks.
That’s why these bit manipulation instructions contributed that much to the overall performance of the library.
I saw similar approach in other software as well. Not necessarily for voxels, also 2D data, and trees where nodes use bits to track children.
Bitmap representations are seriously underused throughout the software world. I am used to getting 10x performance improvements where they are first introduced. A popcount instruction often accounts for much of that.
It does, however, omit several imporant uses for the
operation.
Perhaps the most important of these is the integer
log-base-2 operation. The popcount instruction
follows a bitwise operation to transform the input
from the form 0..01x..x into 0..011..1. Often these
operations are merged into another instruction,
"count leading zeroes". We also find "count leading
ones", "count trailing ones", "count trailing zeroes",
all relying on the same hardware implementation.
Most often the result of one of these operations is
used as a shift count, or to provision storage for a
corresponding number of larger objects.
The importance of the instruction in accelerating
algorithms arises from its very slow loopwise
implementation, so a software emulation is often
worse than useless.
The cores I compare the smaller RV cores to (the ARMv*-M cores) don't seem to have a popcount instruction either, not even in their new Helium vector ISA.
If it's a deficiency at that gate count, it's one RV doesn't have to compete on it seems.
I'm saying the lack of a popcount isn't a deficiency, considering that it's competitors don't have it either. And given the huge amount of work ARM put into the Helium ISA, I wouldn't be surprised if the instruction simply didn't make sense at that gate count anyway. For cores that go down to 12K gates like a CortexM0 or the smaller RV32E cores, popcount in the ALU is a distinct non zero cost.
And yeah, you for sure need to be better to compete. Zero fees is a large chunk of what'll get you there for these tiny cores. When you're being compared on borderline ascetic area efficiency, adding features works against you.
I haven't heard anything specific. It may be because it is still a little too early and that the Indian startups/companies are not familiar with chip/semiconductor company economics (huge upfront investment, long time periods etc.) Hopefully, with the "Make in India" push that might change. That said, there has already been a couple of tapeouts and govt. agencies (ISRO and defence sector companies) are playing with them. A company "Incore Semiconductors"(https://incoresemi.com/) in the model of "ARM Holdings" has also been formed to sell the technology.
Thanks, I don't think anyone can even think about fabrication not just Indian startups, at this point TMSC is defacto for anyone thinking about semiconductor.
IMO, we need to do what Espressif did with ESP8266/ESP32 for Shakti.
Incore semi sounds interesting, will check them out.
Actually, there seems to be two small fabs in India (of course nothing on the scale of TSMC); one in Chandigarh (Semi Conductor Laboratory) and one in Bangalore(DRDO)- See;
I was familiar with the work of SITAR, not so much about the other lab which I presume does similar work.
TSMC is a national asset of Taiwan, not just Taiwan but Western powers treat it as a strategic asset, unconfirmed sources say it's rigged to self-destruct incase the island gets invaded.
We need such kind of commitment and push towards semiconductor industry.
Very frustrating reading the comments from the Aldec marketing head, and it's really disappointing to see such comments coming from a company which has previously engaged so much with the open-source community (they've actually put some effort into proper support for co-simulation using the Python-based cocotb).
For example:
> Q: What’s still missing out of the design flow? Do all the tools work with RISC-V?
> A: The main thing is the lack of UVM support.
This answer just seems like a category error: UVM is (for all intents and purposes) a library for SystemVerilog; an ISA does not "have UVM support", they just aren't even closely related. Perhaps he means that (open-source) implementations of RISC-V cores haven't been verified using UVM, or that open-source functional-verification tooling typically doesn't support UVM?
There are two reasons that no-one is using UVM in the open-source community: it's absolutely unbelievably dreadful, and full SystemVerilog support is generally lacking from open-source tooling. Unless you've got $$$$ to spend on a Big-3 simulator (let's be honest, no-one is using Aldec simulators to tape out an ASIC), you can't use UVM even if you wanted to.
> UVM would give you constrained random verification, functional coverage, and re-usability.
UVM (along with most of the "software" features of SystemVerilog, and pretty much everything else that's come out of Accellera) is a load of utter rubbish that unfortunately won't die. These advantages have absolutely nothing to do with UVM, or even to do with using SystemVerilog as a verification language: they can all be achieved by co-simulating designs with testbenches written in a better language, for example cocotb testbenches written in Python.
The reason why UVM is so prevalent has absolutely nothing to do with its suitability or superiority over other frameworks, and everything to do with the cabal of EDA vendors who are trying to maximise their profit: just consider that Cadence has net profits over double that of Arm, and it becomes apparent that the companies who use these tools are "small fish" compared to the companies making them, as perverse as this may be. Arm would much rather write RTL than tooling, so they're completely at the mercy of the Big-3: if Cadence says that UVM is the only way forward, Arm (and the rest of the industry) is pretty much compelled to follow.
> Along those lines, we see big problems on the business side. The open-source business model makes it very difficult for companies to make an investment. We do see open source becoming a huge movement, and we’ve already seen that happen in the software domain. But for EDA, and particularly hardware verification tools, we’re still gauging what we’re going do with it. You need to make sure that whatever you invest is going to generate a return on that investment.
The subtext here is that EDA vendors absolutely don't under any circumstances want a strong open-source community in the hardware design space: as soon as the community reaches a critical mass, open-source EDA tooling might start to become a more-and-more viable alternative for use in industry, and the EDA vendors might actually have to start doing some work.
I'm convinced that the first place this will happen is formal verification: SiFive is already using Kami [1], which came from Adam Chlipala's group at MIT and is open-source. The costs of formal verification are absolutely worth it in expensive ASIC projects, but it seems EDA vendors (and even companies like Arm) have been really slow to see the light. There's a continuous stream of developments in this space coming out of academia which can be used for real-world projects.
The general state of the EDA industry is dire. My employer recently spent about $1.5MM renewing licences for a Big-3 simulator: this is the same tool which I can consistently make segfault in about 10 different ways. The quality of tooling is appalling, and the industry is incredibly slow to react. I'm really hopeful that projects such as RISC-V will stimulate the open-source hardware community enough that we might actually see some change in this space.
You have absolutely nailed it. Over the past 7 years, I have observed EDA tooling frustrating to the point that RTL design/verification/structural engineers in my $megacorp company abandoned this entire industry and move to software jobs. System-on-chip design industry desperately needs to adopt an open source EDA tool chain from architecture to synthesis and layout.
Fortunately we have some amazing developments in this areas over the past year, specifically BlueSpec compiler becoming open sourced, SoC design and simulation framework such as ChipYard based on Chisel and Verilator. Lets hope this open source movement gains momentum and allows rapid innovation in this space.
I'm not deeply familiar w/ what's available for RISC-V but it seems like all that comes in the box for testing is a black box reference design to bench against. The complaint about "lack of UVM support" may be a poorly stated reference to a need to write a lot of your own verif.
But there is no box! RISC-V is just an ISA; ARM also requires you to write a lot of your own verification, unless you buy VIP (possibly even more expensive than the architectural license) from Arm.
RISC-V does however have a complete formal specification, unlike other ISAs, which is absolutely not just a "black box reference design".
I'm not claiming there should be, but this is a thing people tend to complain about. This is ultimately why the market for hard IP drop-in ARM cores flourished.
A lot of the excitement I see around RISC-V seems to go like this: Wow, the RTL is open source! This is so great, anybody will be able to tape out their own custom core in like a week! People don't realize it's just a starting point, instead of a complete product.
As long as you are talking about RISC-V itself and not about a specific implementation, the RTL is neither open source nor not open source because there is no RTL. RISC-V is an instruction set specification.
As a CPU hobbyist, what this gives you is not a single step towards tapeout, but rather a working software-side toolchain (assembler, compiler, ...) right from the start, plus a sensible instruction set and encoding made by people who know what they are doing.
I've personally been having a lot of fun trying to write a rv32im processor in an HDL. I know people have already done this, but this is for a fun learning experience!
I actually just ordered a HiFive1 Rev B to play around with. Kinda excited for the potential of RISC-V, we'll see how much wind this article takes out of my sails. Thanks for sharing it.
The RISC-V spec includes recommended insn sequences for checking arithmetic overflow. With macro instruction fusion, these can easily be turned into hardware "overflow instructions" - it's purely an implementation detail.
What does that even mean? It has an integer add instruction, surely, and integer add can overflow. Do you mean you get no exception? or no status flag to check?
I vaguely recall risc-v having no status flags, instead preferring explicit instructions to check for such things. The idea is that, instead of implicitly paying for the checks on every operation whether or not you need it, you explicitly pay for it when you need it. At the hardware level, there's less interconnects and so all operations can be implemented simpler or faster.
Integer overflow on addition of two unsigned values can be detected by comparing the result to one of the operands to see if it's smaller, which can probably be done in one additional instruction. I'm sure there's suitably clever tricks for signed addition as well as subtraction. All stuff that a compiler can implement appropriately.
Maybe this limits the relative usefulness of risc-v silicon for programming languages that do a lot of safety checking. If your CPU spends 10% of its time doing additional runtime "safety" instructions vs. other architectures, but takes up 10% less space or runs 10% faster due to the simplicity, it seems like a wash though. Programs that don't need the runtime safety can then run 10% faster. Additionally, nothing is stopping someone from designing hardware that accelerates common instruction sequences. A CPU could still internally implement status flags and decode common compiler instruction sequences faster than otherwise.
EXCERPT
Overflow checking for unsigned addition requires only a single additional branch instruction after the addition: add t0, t1, t2; bltu t0, t1, overflow.
For signed addition, if one operand’s sign is known, overflow checking requires only a single branch after the addition: addi t0, t1, +imm; blt t0, t1, overflow. This covers the common case of addition with an immediate operand.
FLOATING POINT
As allowed by the standard, we do not support traps on floating-point exceptions in the base ISA, but instead require explicit checks of the flags in software. We considered adding branches controlled directly by the contents of the floating-point accrued exception flags, but ultimately chose to omit these instructions to keep the ISA simple.
> paying for the checks on every operation whether or not you need it
the actual overflow check is just the carry out... it's pretty cheap. putting it in a flags register creates dependencies. But there are presumably other approaches possible-- optional trapping, additional bits per output register, etc.
Status flags are a bit of a nightmare if you're building a massively superscalar implementation, every instruction has an explicit dependency on the one before - there have been a bunch of different solutions which often cost instruction decode space. RISC-V says "usually we don't care, let's only take the cost when we really need it" which is a very RISCish sort of thing to do.
For those who haven't dabbled RISC-V has compare-and-branch instructions rather than branch on condition instructions
RISC-V is designed to the momentarily fashionable idea that status flags are a big hurdle in optimizing instruction sequencing.
Modern cores rename status flags along with all the other registers, making them cheap. The tortuous workarounds in RISC-V are worse than useless -- they are a tax on all users, but particularly on resource-constrained users who don't get instruction reordering anyway.
In the future we will sneer at RISC-V's lack of implicit zero and overflow checks the way we sneer at SPARC's delay slots today.
> RISC-V is designed to the momentarily fashionable idea that status flags are a big hurdle in optimizing instruction sequencing.
> Modern cores rename status flags along with all the other registers, making them cheap.
The RISC-V is designed in particular for superscalar processors.
Renaming status registers does not make them cheap in superscalar microarchitecture.
Just as delay slots make superscalar microarchitectures even more complex.
All the microarchitectural states exposed by ISA, which allows optimization on scalar pipelined microarchitecture, make the implementation of superscalar microarchitectures more complex. Since these microarchitectural states do not represent what is actually happening in the hardware.
This explains why MIPS is so unpopular today, it is fundamentally incompatible with the superscalar microarch.
And this explains why RISC-V has chosen not to expose any microarchitectural states.
I see that you missed the point. People working on current x86 and POWER say status flags are no longer considered the problem they used to be. If you are investing the real estate to be superscalar, they are a trivial complication. If you are not, then the workarounds for their lack cost.
You are right, I didn't answer to your point. So I'm going to complete my thought.
I explain that RISC-V removes unnecessary complexity from the hardware and puts this unnecessary complexity on the software. All they did was move a problem from the hardware to the software.
And that's a good thing here, because overflow checks are (almost) never in the critical path of the application, it's (almost) only used for error handling.
As these checks are almost (always) out of the critical path, it doesn't matter, this can be handled in parallel.
And when they are on the critical path, the cost is definitely marginal.
This simplifies the hardware because it removes an implicit dependency between all instructions, leaving only the explicit register names as the dependency.
It is because of this kind of factors that RISC-V cores are smaller than X86 cores, for example. Which leaves more room for optimization.
But if for some applications this becomes mandatory (an hardware overflow detection or a bound checker), RISC-V can in the future also add an extension to handle it.
They could, for example, add a global status flag, or a status flag per register (which would remove the implicit dependency).
But for now this is not necessary.
You also talk about "tortuous workarounds". But the risc-v manual talks about "single instruction" in most cases.
> they are a tax on all users, but particularly on resource-constrained users who don't get instruction reordering anyway.
If they have made this CPU choice it is because they don't need performance, so it's not a real issue.
A small core does not demonstrate that performance does not matter. It demonstrates, exactly, constrained resources. Often performance becomes much more important because there is so little to spare.
If you need both performance and a small core (and not a superscalar one, because there are small superscalar cores).
I will definitely not recommend you to use a RISC-V, RISC-V is not designed for that at all.
You should rather turn towards some VLIW architecture (and RISC-V is not compatible with VLIW architecture)
You will also gain in predictability.
Because other companies do, so that's where the action is now.
What matters is can we benefit from each others' work, and is there a lot of work being done and shared by others, to benefit from.
It's a (business as well as developer) community, network effect thing.
You might as well ask, why did RISC-V succeed while OpenRISC failed?
The ISA doesn't really matter. It's not like the RISC-V ISA is mind-blowingly special or anything. It's sensible, it does the job about as well as others.
(I say this even though I'm an enthusiast and member of the RISC-V Foundation. I don't think the ISA is the special secret sauce, I think community and sharing are).
There are some great ISA extensions, but many of them came after RISC-V had already gained momentum. And those extensions are mostly because there's a diverse community of companies and developers producing them.
Another important advantage of RISC-V is academia. RISC-V has much more mindshare/backing from the university researchers, being started in UC Berkeley with big names like David Patterson. OpenPower...not so much. And I strongly believe university research being on top of RISC-V is the biggest advantage in the long term - what that means is that masters and PhD students will be fully trained on RISC-V ISA, and many new architectural research will be already based on the ISA.
OpenPower never had, and likely never will have academic backing. Why would a university research choose OpenPower ? The only scenario I can imagine is if they are pursuing a funding from OpenPower companies, but OpenPower is already a small minority in ISA mind-share and market share, and I don't see any chance of them catching up.
It’s been so much more fun to learn than x86!
Edit: link to the tutorial (http://osblog.stephenmarz.com/)