Google, HP, Oracle Join RISC-V – Open-source processor core gains traction

zhemao · on Jan 3, 2016

Hi, I'm a Ph.D. student at UC Berkeley on the RISC-V team. Happy to answer any questions about the RISC-V ISA or Rocket, our open-source reference implementation.

Since there's always confusion about it, I'll start off by clarifying the difference between the two. RISC-V is an open-source ISA standard. Rocket is an implementation of this ISA which also happens to be open source. We do not intend for the ISA to be tied to a single reference implementation. The intention for RISC-V is to enable many different implementations (whether open-source or proprietary). These different implementations can all run an open-source software ecosystem, which currently consists of a GNU toolchain, LLVM, and Linux kernel port.

Natsu · on Jan 3, 2016

When can we expect to see the license terms? This claims that "The terms will specify a zero-royalty RAND license as well as verification suites licensees must run to use the RISC-V logo" but it appears that free (libre) implementations might not be permissible.

_chris_ · on Jan 4, 2016

RISC-V is a standard. Anybody can implement a CPU with the standard and then do whatever they want with that CPU. There are plenty of open-sourced RISCV cores already. Copyleft vs copyright is only an issue regarding how much of the extra "libraries"/layouts/foundry tooling has to be made available too.

Of course, somebody is also FREE to implement something that's "totally not RISC-V", and call it whatever they want. They can change one tiny thing or even nothing.

The major win here is:

a) if somebody WANTS to call their thing "RISC-V compatible", it must implement the same core ISA instruction set as everybody else and

b) all of these companies are pooling their resources regarding lawsuits or patent fights against the core ISA.

legulere · on Jan 3, 2016

The GNU toolchain, LLVM and Linux port seem not to be mainlined yet. Is there a plan when patches will get submitted for inclusion?

_chris_ · on Jan 4, 2016

It's a work-in-progress. But yes, expect it to happen Sometime (tm).

FullyFunctional · on Jan 4, 2016

Yes, this is actively being worked on, see the mailing list archives.

rwmj · on Jan 3, 2016

In reality, how far are you from having silicon?

_chris_ · on Jan 4, 2016

Berkeley has done 11+ tape-outs of RISC-V chips. LowRISC is looking to do a silicon run in a year-ish, and there's already one company that's publicly stated they've already been shipping RISC-V cores in their cameras.

jychang · on Jan 4, 2016

> RISC-V cores in their cameras

Got a source on that?

_chris_ · on Jan 4, 2016

I've talked with them in person at the RISC-V workshops. But for better proof, Slide 35: http://riscv.org/workshop-jun2015/riscv-intro-workshop-june2...

zhemao · on Jan 3, 2016

We have taped out several chips for our own research. But as a university research lab, we do not have any plans for a commercial manufacturing run. The lowRISC team can say more about the roadmap to a commercial dev board.

1ris · on Jan 3, 2016

Great! I heard somewhere here on HN that a modern x86 decoder is smaller than a modern arm decoder. Do you know if this is true? Also, how big is the decoder of risc-v and does it even matter? (I mean the size)

zhemao · on Jan 3, 2016

> a modern x86 decoder is smaller than a modern arm decoder

That's because the ARM ISA is not small either by any stretch of the imagination. On the other hand, the instruction listing of the base RISC-V ISA and the standard extensions can fit on a single powerpoint slide.

http://riscv.org/workshop-jun2015/riscv-intro-workshop-june2...

I wasn't involved in any of the recent tape-outs, so I can't say exactly how big the decoder is. But it's quite small relative to the other chip components. Currently, the integer pipeline of the chip is roughly the same size as the FPU, and these two together are roughly the same size as the L1 cache. All of those components together are smaller than the L2 cache (depends on the size of the L2 cache, though). So decoder size doesn't really matter in the grand scheme of things.

Decoder speed probably does matter, though. Currently, we can decode an instruction in a single cycle (1 ns). The x86 decoder, on the other hand, can take multiple cycles depending on instruction. But maybe this isn't a fair comparison since the instructions are decomposed into uops. I have no idea about the performance of ARM decoders.

wolf550e · on Jan 4, 2016

How can you be multiscalar with a decoder that only does 1 ops/cycle? Intel does 6:

> From the original Core 2 through Haswell/Broadwell, Intel has used a four-wide front-end for fetching instructions. Skylake is the first change to this aspect in roughly a decade, with the ability to now fetch up to six micro-ops per cycle. Intel doesn’t indicate how many execution units are available in Skylake’s back-end, but we know everything from Core 2 through Sandy Bridge had six execution units while Haswell has eight execution ports. We can assume Skylake is now more than eight, and likely the ability to dispatch more micro-ops as well, but Intel didn’t provide any specifics.

http://www.maximumpc.com/idf-2015-san-francisco-skylake-deep...

luismarques · on Jan 4, 2016

I think he meant that the decoding latency is 1 cycle, not that per 1 cycle the core can only decode one instruction.

That is, each baby takes 9 cycles to form, but per 9 cycles the population can have more than one baby.

Symmetry · on Jan 4, 2016

He was talking about latency, you're talking about throughput.

Symmetry · on Jan 3, 2016

Is that for a decoder that can decode multiple instructions per clock cycle? I think it would be somewhat interesting for a single instruction decoder but it would be quite remarkable for a decoder of greater width since x86 instructions aren't even self synchronizing (you can read the same sequence of bytes in different valid ways depending on where you start) while ARM is fixed width.

FullyFunctional · on Jan 4, 2016

There are two, Berkeley's BOOM and another from Macaque Labs in India (SHAKTI OO core).

Symmetry · on Jan 4, 2016

Those don't seem to be tiny x86 decoders, as far as I can tell.

orionblastar · on Jan 3, 2016

What operating systems have been ported to RISC-V so far?

I would assume GNU/Linux, Android, Solaris because of the companies involved. Am I correct?

justin66 · on Jan 3, 2016

If anyone has a RISC-V processor that will run Solaris, let alone a port of Solaris, they haven't announced it.

It really would be interesting to know why Oracle got involved, though, since their involvement is not the universal sign of pending success for an open source project.

richardwhiuk · on Jan 3, 2016

To be a cynic, one of the best ways to kill a project is to get involved and make a mess from the inside.

BogusIKnow · on Jan 4, 2016

Wasn't that in the CIA operations manual that surfaced some time ago?

https://news.ycombinator.com/item?id=4831363

rodgerd · on Jan 4, 2016

That would be my bet.

jerrysievert · on Jan 3, 2016

to be a non-cynic, oracle is going to want to make sure their stack runs as efficiently as possible on any platform that google and hp are backing. getting involved at the ground level allows them to provide input and to shape the architecture as much as possible.

justin66 · on Jan 4, 2016

No argument about hp, but why Google?

jerrysievert · on Jan 4, 2016

i call people in that position "frenemies". often times in the large enterprise space, even your largest rivals have something you want/need, whether you like it or not.

sometimes it is writing software that interoperates (bi and the like), other times it is direct need (java in the case of google and oracle, and i'm sure at least some backend systems).

orionblastar · on Jan 4, 2016

Google wants faster CPUs that use lower power and don't cost too much to be used in their server farms.

Sooner or later they will port Android to it so it can be used in smart phones and tablets.

justin66 · on Jan 4, 2016

We were speculating as to why Oracle would be interested in getting in bad with these guys on this architecture, specifically. Google is if anything an opponent of Oracle, so I assume Oracle's reasons for getting involved are somewhat different than what you've outlined.

vidarh · on Jan 4, 2016

It makes sense for Oracle to keep a close eye on new architectures that seems like it may get some traction. If it makes it into the server space, it'll affect them.

protomyth · on Jan 4, 2016

Given all the flap about HP & Oracle over the Itanium servers and Mark Hurd, I'm not sure HP is on Oracle's Xmas card list.

Qwertious · on Jan 7, 2016

It occurs to me that Oracle has a Java business, and Google has an Android business.

mkj · on Jan 4, 2016

It's important to keep Intel on their toes.

Sanddancer · on Jan 4, 2016

Intel's a sponsor too.

vidarh · on Jan 4, 2016

Intel also has an ARM architecture license (allowing them to design their own ARM cores).

Being involved in alternative architectures seems like a sound defensive move.

privong · on Jan 3, 2016

The article says:

> Currently RISC-V runs Linux and NetBSD, but not Android, Windows or any major embedded RTOSes. Support for other operating systems is expected in 2016.

No mention of Solaris, so I assume that would be forthcoming?

heshamelmatary · on Jan 4, 2016

There are initial/working ports of RTEMS (as an RTOS) as well as seL4 microkernel.

http://heshamelmatary.blogspot.co.uk/2015/12/rtems-port-for-...

FullyFunctional · on Jan 4, 2016

Linux is probably the focus, but there are ports in various stages for NetBSD, FreeBSD, and seL4 (and more I imagine).

FullyFunctional · on Jan 4, 2016

I'd add to that that there are already more implementations than I can count. Rocket isn't the only ASIC game in town, and there are countless soft-cores (FPGA implementations).

rsync · on Jan 4, 2016

Hello Zhemao,

I live in the SFBA. Is it possible to come to Berkeley and see the reference implementation running ?

Nothing special - I would just be fascinated to see something like this in person.

zhemao · on Jan 4, 2016

Um, it depends on what exactly you mean by "reference implementation" or "running". If you mean one of the silicon test chips, that might be hard to swing. I don't have them at my desk and they take a lot of setup to actually use (a setup process which I am unfamiliar with). I'd have to ask the grad student who worked on the bring-up to help me. But if you'd be satisfied with seeing the reference RTL run on an FPGA, that would certainly be possible.

FullyFunctional · on Jan 4, 2016

Or just run it yourself. I brought up Rocket on a Zedboard following the instructions. Rocket isn't ideal if your ultimate target is an FPGA, but IIRC it's currently the only that includes the virtual memory support that is needed to boot Linux.

zhemao · on Jan 4, 2016

BOOM also has virtual memory support. But yes, if you have a Zedboard, you can run the reference RTL yourself. Of course, those things are pretty costly, so I understand if you'd like to see someone else do it.

jeffdavis · on Jan 4, 2016

Does RISCV also specify privileged instructions? Or only the instructions an application would use?

_chris_ · on Jan 4, 2016

A privileged spec is available, although it's not frozen yet. A Linux port already exists.

asb · on Jan 3, 2016

There's been a fair bit of discussion about this over on reddit:

https://www.reddit.com/r/linux/comments/3z9t7k/open_source_p...

https://www.reddit.com/r/opensource/comments/3z5ue8/open_sou...

I'm a cofounder of lowRISC, a not-for-profit working to produce a fully open source SoC implementing the RISC-V ISA, in volume silicon. If you have questions then fire away.

modeless · on Jan 3, 2016

Not a question really, but a comment for anyone involved: please push to add integer overflow traps. No processor adds support because C doesn't require them, and as a result no language detects integer overflow by default because the processors make it slow. We need to break this cycle and it's not often that a new processor architecture comes around.

http://blog.regehr.org/archives/1154

asb · on Jan 3, 2016

This has been discussed before in the RISC-V community. See https://lists.riscv.org/lists/arc/hw-dev/2014-09/msg00007.ht... (sorry, you'll probably have to click the "I'm not a spammer link" first time and then load the link again - the riscv mailing lists seem to use the most painful mailing list software and Gmane wasn't archiving the riscv lists at that time).

One reason for push-back on implicit overflow checking is that it complicates superscalar designs by adding another exception source. The good news is that with an open ISA like RISC-V with high quality reference implementations, we can finally actually perform meaningful experiments to test these assumptions - adding different overflow checking semantics to a realistic implementation, quantifying the difference when putting it through the ASIC flow for a real process and making the matching changes to the compiler. It seems ridiculous that us in the computer architecture community haven't had the ability before.

luu · on Jan 3, 2016

There is no chicken and egg problem for most workloads. Processors are quite good at handling correctly predicted branches, and overflow checks will be correctly predicted for basically all reasonable code. In the case where the branch is incorrectly predicted (because of an overflow), you likely don't care about performance anyway.

See http://danluu.com/integer-overflow/ for a quick and dirty benchmark (which shows a penalty of less than 1% for a randomly selected integer heavy workload, when using proper compiler support -- unfortunately, most people implement this incorrectly), or try it yourself.

People often overestimate the cost of overflow checking by running a microbenchmark that consists of a loop over some additions. You'll see a noticeable slowdown in that case, but it turns out there aren't many real workloads that closely resemble doing nothing but looping over addition, and the workloads with similar characteristics are mostly in code where people don't care about security anyway.

modeless · on Jan 3, 2016

People who are actually implementing new languages disagree. Look at the hoops Rust is jumping through (partially) because they don't feel comfortable with the performance penalty of default integer overflow checks: https://github.com/rust-lang/rfcs/pull/146

kibwen · on Jan 3, 2016

That proposal for Rust was ultimately not accepted, here's what replaced it: https://github.com/nikomatsakis/rfcs/blob/integer-overflow/t...

TL;DR: There exists a compiler flag that controls whether or not arithmetic operations are dynamically checked, and if this flag is present then overflow will result in a panic. This flag is typically present in "debug mode" binaries and typically absent in "release mode" binaries. In the absence of this flag overflow is defined to wrap (there exist types that are guaranteed to wrap regardless of whether this compiler flag is set), and the language spec reserves the right to make arithmetic operations unconditionally checked in the future if the performance cost can be ameliorated.

modeless · on Jan 3, 2016

Yeah, I think Rust has probably made the right decision here, but it's frustratingly imperfect. This introduces extra divergence in behavior between debug and release mode, which is never good.

Note that there's even pushback in this thread about enabling overflow checks in debug mode due to performance concerns...

kibwen · on Jan 3, 2016

I'm hopeful that as an industry we're making baby steps forward. Rust clearly wants to use checked arithmetic in the future; Swift uses checked arithmetic by default; C++ should have better support for checked arithmetic in the next language revision. All of these languages make heavy use of LLVM so at the very least we should see effort on behalf of the backend to reduce the cost of checked arithmetic in the future, which should hopefully provide additional momentum even in the potential absence of dedicated hardware support.

luu · on Jan 3, 2016

If you read the thread, you'll see that the person who actually benchmarked things agrees: someone implemented integer overflow checks and found that the performance penalty was low, except for microbenchmarks.

If you click through to the RISC-V mailing list linked to elsewhere in this discussion, you'll see that the C++17 standard library is planning on doing checked integer operations by default. If that's not a "performance focused language", I don't know what is.

kibwen · on Jan 3, 2016

  > the C++17 standard library is planning on doing checked 
  > integer operations by default

In C++, wrapping due to overflow can trivially cause memory-unsafe behavior, so it's a pragmatic decision to trade off runtime performance for improved security. However, Rust already has enough safety mechanisms in place that integer overflow isn't a memory safety concern, so the tradeoff is less clear-cut.

Note that the Rust developers want arithmetic to be checked, they're just waiting for hardware to catch up to their liking. The Rust "specification" at the moment reserves the right to dynamically check for overflow in lieu of wrapping (Rust has long since provided types that are guaranteed to wrap for those occasions where you need that behavior).

  > someone implemented integer overflow checks and found 
  > that the performance penalty was low, except for 
  > microbenchmarks.

I was part of that conversation back then, and the results that I saw showed the opposite: the overhead was only something like 1% in microbenchmarks, but around 10% in larger programs. (I don't have a link on hand, you'll have to take this as hearsay for the moment.)

modeless · on Jan 3, 2016

The benchmark I see says up to 5% in non-microbenchmarks. A 5% performance penalty is not low enough to be acceptable as the default for a performance-focused language. If you could make your processor 5% faster with a simple change, why wouldn't you do it?

Even if the performance penalty was nonexistent in reality, the fact is that people are making decisions which are bad for security because they perceive a problem, and adding integer overflow traps will fix it.

luu · on Jan 3, 2016

As someone who's spent the majority of their working life designing CPUs (and the rest designing hardware accelerators for applications where CPUs and GPUs aren't fast enough), I find that when people say something like "If you could make your processor 5% faster with a simple change, why wouldn't you do it?", what's really meant is "if, on certain 90%-ile or 99%-ile best case real-world workloads, you could get a 5% performance improvement for a significant expenditure of effort and your choice of a legacy penalty in the ISA for eternity or a fragmented ISA, why wouldn't you do it?"

And the answer is that it there's a tradeoff. All of the no-brainer tradeoffs were picked clean decades ago, so all we're left with are the ones that aren't obvious wins. In general, if you look at a field an wonder why almost no one has done this super obvious thing for decades, maybe consider that it might be not so obvious after all. At zurn mentioned, there are actually a lot of places where you could get 5% and it doesn't seem worth it. I've worked at two software companies that are large enough to politely ask Intel for new features and instructions; checked overflow isn't even in the top 10 list of priorities, and possibly not even in the top 100.

In the thread you linked to, the penalty is observed to be between 1% and 5%, and even on integer heavy workloads, the penalty can be less than 1%, as demonstrated by the benchmark linked to above. Somehow, this has resulted in the question "If you could make your processor 5% faster ...". But you're not making your processor 5% faster across the board! That's a completely different question, even if you totally ignore the cost of adding the check, which you are.

To turn the question around, if people aren't willing to pay between 0% and 5% for the extra security provided, why should hardware manufacturers implement the feature? When I look at most code, there's not just a 5% penalty, but a one to two order of magnitude penalty over what could be done in the limit with proper optimization. People pay those penalties all the time because they think it's worth the tradeoff. And here, we're talking about a penalty that might be 1% or 2% on average (keep in mind that many workloads aren't integer heavy) that you don't think is worth paying. What makes you think that people would who don't care enough about security to pay that kind of performance penalty would pay extra for a microprocessor that gives has this fancy feature you want?

modeless · on Jan 3, 2016

> people aren't willing to pay between 0% and 5% for the extra security provided

This is not true. One problem is that language implementations are imperfect and may have much higher overhead than necessary. An even bigger problem is that defaults matter. Most users of a language don't consider integer overflow at all. They trust the language designers to make the default decision for them. I believe that most people would certainly choose overflow checks if they had a perfect implementation available, and perfect knowledge of the security and reliability implications (i.e. knowledge of all the future bugs that would result from overflow in their code), and carefully considered it and weighed all the options, but they don't even think about it. And they shouldn't have to!

For a language designer, considerations are different. Default integer overflow checks will hurt their benchmark scores (especially early in development when these things are set in stone while the implementation is still unoptimized), and benchmarks influence language adoption. So they choose the fast way. Similarly with hardware designers like you. Everyone is locally making decisions which are good for them, but the overall outcome is bad.

kibwen · on Jan 3, 2016

  > if people aren't willing to pay between 0% and 5% for 
  > the extra security provided

In the context of Rust, integer overflow checks provide much less utility because Rust already has to perform static and dynamic checks to ensure that integers are used properly, regardless of whether they've ever overflowed (e.g. indexing into an array is a checked operation in Rust). So as you say, there's a tradeoff. :) And as I say elsewhere in here, the Rust devs are eagerly waiting for checked overflow in hardware to prove itself so that they can make it the default and do away with the current compromise solution (which is checked ops in debug builds, unchecked ops in release builds).

adapteva · on Jan 11, 2016

Amen!

zurn · on Jan 3, 2016

There are areas where you could make a typical current processor "up to 5%" faster in exchange for dumping various determinism features provided in hardware that are conductive to software robustness in the same way as checked arithmetic. For example the Alpha had imprecise exceptions and weak memory ordering. The consensus seems to be against this kind of tradeoff.

zurn · on Jan 3, 2016

Is there a writeup somewhere on why the Rust people nevertheless decided against checked arithmetic?

The current RFC seems to be https://github.com/rust-lang/rfcs/blob/master/text/0560-inte... which seems to avoid taking a stand on the performance issue.

kibwen · on Jan 3, 2016

This RFC was the result of a long discussion that took place in many forums over the course of several years, so it's tricky to summarize. Here's my attempt:

1. Memory safety is Rust's number one priority, and if this were a memory safety concern then Rust's hands would be tied and it would be forced to use checked arithmetic just as it is forced to use checked indexing. However, due to a combination of all of Rust's other safety mechanisms, integer overflow can't result in memory unsafety (because if it could, then that would mean that there exists some integer value that can be used directly to cause memory unsafety, and that would be considered a bug that needs to be fixed anyway).

2. However, integer overflow is still obviously a significant cause of semantic errors, so checked ops are desirable due to helping assure the correctness of your programs. All else equal, having checked ops by default would be a good idea.

3. However however, performance is Rust's next highest priority after safety, and the results of using checked operations by default are maddeningly inconclusive. For some workloads they are no more than timing noise; for other workloads they can effectively halve performance due to causing cascading optimization failures in the backend. Accusations of faulty methodology are thrown around and the phrase "unrepresentative workload" has its day in the sun.

4. So ultimately a compromise is required, a new knob to fiddle with, as is so often the case with systems programming languages where there's nobody left to pass the buck to (and you at last empathize with how C++ got to be the way it is today). And there's a million different ways to design the knob (check only within this scope, check only when using this operator, check only when using this type, check only when using this compiler flag). In Rust's case, it already had a feature called "debug assertions" which are special assertions that can be toggled on and off with a compiler flag (and typically only enabled while debugging), so in lieu of adding any new features to the language it simply made arithmetic ops use debug assertions to check for overflow.

So in today's Rust, if you compile using Cargo, by default you will build a "debug" binary which enables checked arithmetic. If you pass Cargo the `--release` flag, in addition to turning on optimizations it will disable debug assertions and hence disable checked arithmetic. (Though as I say repeatedly elsewhere, Rust reserves the right to make arithmetic unconditionally checked in the future if someone can convincingly prove that their performance impact is small enough to tolerate.)

wyldfire · on Jan 4, 2016

> by default you will build a "debug" binary which enables checked arithmetic

The check failures trigger a panic?

Is there any work to enable an ASan-like feature for unsafe blocks BTW?

Gankro · on Jan 4, 2016

Yes, they trigger a panic. No, there's no ASan.

There isn't as strong a need for ASan in Rust because so little code is unsafe. Most of the time, the only reason you drop down to unsafe code is because you're trying to do something compilers are bad at tracking (or that is a pain in the neck to encode to a compiler). It's usually quite well-contained, as well.

You can work with uninit memory, allocating and freeing memory, and index into arrays in Safe Rust without concern already (with everything but indexing statically validated).

IMHO the kind of stuff `unsafe` is used for is very conducive to aggressive automated testing.

Marat_Dukhan · on Jan 3, 2016

MIPS has arithmetic instructions which trap on overflow. And guess what - gcc & clang use variants which DO NOT trap on overflow.

twoodfin · on Jan 4, 2016

Why? If I had to guess, it's because there's more than a little C code out in the wild that relies on "typical" signed overflow behavior.

renox · on Jan 5, 2016

I don't know: both compilers are trying to use 'undefined use' to optimize code (and too bad for you if this create problems for your application), so your explanation isn't coherent.. My explanation is: lack of interest/money: security is the thing that is always ignored, see the lack of funding of OpenSSL until recently..

renox · on Jan 4, 2016

has or had? I think that some MIPS variant deprecated the trap on overflow, if this is the case then gcc & clang behaviour is logical if not then it's un-comprehensible (especially since at the same time they justify f.. up your executable for 'optimisation' purpose if you have an undefined in your code).

Marat_Dukhan · on Jan 4, 2016

Forms with immediate operands were removed in MIPS Release 6, but the register-register forms are still there.

Symmetry · on Jan 3, 2016

There's always the Mill[1] which does have an add varient that detects them by default. Of course that isn't even out yet.

[1]http://millcomputing.com/

Someone · on Jan 4, 2016

"no language detects integer overflow by default because the processors make it slow"

Not quite 'no'. Of course there are languages that that dynamically switch to bignums, but I doubt that is what you mean. Swift does detect overflow (by aborting, IIRC). https://developer.apple.com/library/ios/documentation/Swift/...:

"If you try to insert a number into an integer constant or variable that cannot hold that value, by default Swift reports an error rather than allowing an invalid value to be created. This behavior gives extra safety when you work with numbers that are too large or too small."

devit · on Jan 3, 2016

Also consider instructions for efficient atomic reference counting, with traps on both inc (overflow) and dec.

In particular, they can have weaker ordering semantics and they can be buffered and elided among themselves (obviously with some sort of inter-core snooping).

And possibly support for "tagged" numbers, e.g. add integers if high bit is not set, call function otherwise, same for floats if not NaN, with a predictor for them.

vardump · on Jan 4, 2016

> Also consider instructions for efficient atomic reference counting, with traps on both inc (overflow) and dec.

Atomic reference counting is slow and gets even worse the more CPU cores and especially CPU sockets you have. If you can afford to make as expensive operation as an atomic add, you can definitely afford to add overflow checks. Atomic add is 50-1000+ clock cycles depending on contention, core/socket count and "moon phase" -- latency is somewhat unpredictable.

> In particular, they can have weaker ordering semantics and they can be buffered and elided among themselves (obviously with some sort of inter-core snooping).

I'm not sure how weak ordering semantics and fetch-and-add (atomic add) could mix. Aren't atomics about strong ordering by definition? Maybe there's something I don't understand.

> And possibly support for "tagged" numbers, e.g. add integers if high bit is not set, call function otherwise, same for floats if not NaN, with a predictor for them.

You'd still get branch mispredict which I guess you're trying to avoid. There'd be no performance improvement.

KMag · on Jan 6, 2016

If you want to get better performance out of dynamic language implementations that use NaN-tagging, you'll likely get better performance by adding one instruction that performs an indirect 64-bit load using 52-bit or 51-bit NaN-tagged addresses. The instruction should probably contain an immediate value for a PC-relative branch if the value isn't a properly formatted NaN-tagged address.

All languages would benefit from instructions to more efficiently support tracing of native code. A pair of special purpose registers (trace stack and trace limit registers) to push all indirect and conditional branch and call targets would really speed up tracing of native code a la HP's Project Dynamo. Presumably upon trace stack overflow the processor would trap to the kernel or call to userspace interrupt vector entry.

A small pseudorandom number generator and another pair of special purpose registers (stack and limit register) for probabilistically sampling the PC would make profiling lighter weight, both for purposes of human analysis of code and also for runtime optimization in JITs or HP Dynamo-like native code re-optimization.

Taniwha · on Jan 4, 2016

done right you'd not predict this branch, it's the exception that would get the mispredict

vardump · on Jan 4, 2016

Tagging is done to carry information about data type. Like to mark that float64 is actually a 32-bit integer.

Traps (CPU exceptions, such as traditional FPU exceptions like division by zero) usually involve kernel mode context switch. So if you trap on tag, the performance for tagged values will probably be 3-5 orders of magnitude slower. That's a lot.

renox · on Jan 4, 2016

> Traps (CPU exceptions, such as traditional FPU exceptions like division by zero) usually involve kernel mode context switch.

Could you explain why? I thought that trapping was more like a 'slow branch': slow due to the flush the pipeline but why should the kernel be involved(1)?

1: except if you need to swap in a page, but that's just like any other memory reference.

vardump · on Jan 4, 2016

When we're talking about x86, that's true in ring 0. Otherwise first thing CPU does is to enter privileged, ring 0 mode, save registers, jump through interrupt vector table and process the trap in kernel code. Trap handler will probably need to check usermode program counter and take a look at the instruction that caused the trap. No hard data, but I think we're talking about 1-5 microseconds.

Runtime/language exceptions have different mechanisms that don't require kernel context switches (but might involve slow steps like stack walk).

aardvark179 · on Jan 3, 2016

Lots of languages transparently promote to bignuns of various sorts when integer overflow occurs, clang and gcc expose builtins for checking overflow, etc. The only part here that is missing is the implementation of traps.

frik · on Jan 3, 2016

I haven't followed the progress the last few months, so sorry for my probably dumb questions:

What's the current status? (I read http://www.lowrisc.org/blog/2015/06/second-risc-v-workshop-d... (June 30, 2015))

Your site mentions a lot of "FPGA", do you have some actual silicon prototypes?

What's the current raw speed in MHz or FLOPS?

I read about OpenRISC, OpenSPARC, RISC-V, Z-Scale, BOOM - which are furthest in testing phase? Can we buy some of them in 2017? Or will it take longer?

asb · on Jan 3, 2016

Our most recent update was releasing an untethered version of the Rocket core http://www.lowrisc.org/blog/2015/12/untethered-lowrisc-relea.... Our next development goals are integrating the minion cores and integrating with third-party IP in order to produce the initial test chip, which we're intending to tape out later this year.

We haven't yet produced a silicon prototype, but will be taping one out this year. The Berkelely Rocket implementation has been silicon-proven multiple times as has the ETH Zurich PULP core which we also hope to use. The aim of this test chip is to integrate an LPDDR3 memory controller+PHY, plus USB host controller+PHY.

I don't have the link handy, but the Rocket implementation has clocked at 1.5GHz on a 45nm process.

For the final question, perhaps it's useful to define some of these terms:

* OpenRISC: an older 32-bit open ISA.

* OpenSPARC: The open-sourced design from Oracle. GPL-licensed. I don't know of anyone planning to produce a commercially available ASIC using it.

* Z-scale and BOOM are both RISC-V implementations from Berkeley. Z-Scale is a microcontroller-class RISC-V implementation and BOOM is an out-of-order implementation. Both make use of parts of the Rocket implementation (essentially using the codebase as a library). I believe only the base Rocket design has been produced in silicon so far. With lowRISC, we hope to discuss at the upcoming RISC-V workshop (this Tuesday and Wednesday) the status of BOOM, and whether it will make sense to use it as our application cores.

I hope we'll see commercially available lowRISC chips towards the end of 2017, but we'll be able to make a better judgement about how realistic that is once we reach our first test chip.

_chris_ · on Jan 4, 2016

"With lowRISC, we hope to discuss at the upcoming RISC-V workshop (this Tuesday and Wednesday) the status of BOOM, and whether it will make sense to use it as our application cores."

Hi Alex, I look forward to chatting with you guys about BOOM. =)

asb · on Jan 4, 2016

Definitely, looking forward to catching up with you and everyone else this week. With areas we're currently estimating, it seems we would have the area budget for 4x 2-wide BOOM cores (of course it's all quite rough at the minute). I'm keen to hear more on Tim Newsome's debug work and whether BOOM will also reap the rewards of it. See you Tuesday!

zhemao · on Jan 4, 2016

There has been one tape-out of Z-scale. An early iteration of it was used as a power management controller in one of our DVFS test chips. But we've made many changes since then and haven't taped it out since.

alain94040 · on Jan 3, 2016

Rocket implementation has clocked at 1.5GHz on a 45nm process

That's not bad. How many instructions can this issue per cycle?

asb · on Jan 3, 2016

The current Rocket implementation is single issue in-order, but it was designed as a dual issue in-order core.

zhemao · on Jan 3, 2016

> OpenRISC, OpenSPARC

Can't really comment, since these aren't our projects. I haven't really seen any developments on these fronts, though.

> RISC-V, Z-Scale, BOOM

RISC-V is the ISA. Rocket, Z-Scale, and BOOM are implementations of the ISA we've produced at Berkeley. Rocket is our reference implementation. It is a 64-bit in-order core. Z-scale is a small 32-bit core with no MMU intended for microcontrollers. BOOM is an out-of-order 64-bit core. They all share some common code, but BOOM is a bit behind the other two.

We have taped out different Rocket and Z-scale chips. But these run as tethered systems and were only meant for our research. As a university research lab, we do not really have any intentions for mass manufacture. ASB can answer better about when lowRISC chips will be commercially available.

0xcde4c3db · on Jan 3, 2016

Can you talk about how involved/committed chip vendors are in the lowRISC effort or RISC-V generally? I remember seeing mentions of STMicro in some of the Berkeley slides, but it wasn't clear to me whether they were just donating fab capacity/support or were in a more interested/collaborative role.

asb · on Jan 3, 2016

Some of the Berkeley test chips with RISC-V processors were done on ST's 28nm FD-SOI. There's not currently any larger role from ST or any other fab.

roymurdock · on Jan 3, 2016

> A permissive open-source license.

What specific license will you be releasing under? Is that still TBD?

Also, who is putting up the money for dev time and fabrication? Partners?

Exciting stuff. Thanks for sharing!

asb · on Jan 3, 2016

Changes to existing projects (e.g. Berkeley's Rocket RISC-V design) is released under the upstream project's license. We're interested in using an Apache-derived license for new hardware codebases due to the additional patent clauses.

We have received backing from a private individual which has got us going, and some additional donations from some of our project partners.

jacques_chester · on Jan 3, 2016

Companies paying into Foundations which own the IP under an Apache license is becoming a relatively well-tested pattern these days.

The Linux Foundation provides "Foundation as a Service" to several of these. I work for Pivotal, so I'm most familiar with the Cloud Foundry Foundation, which is administered by the Linux Foundation on behalf of members.

From what I can see, it's a smooth way to offload the administrative stuff to specialists while retaining the core openness.

ZenoArrow · on Jan 3, 2016

I remember hearing about lowRISC before, but haven't kept up with it. How have the last 12 months been for the project? Also, what process node are you currently targeting?

asb · on Jan 3, 2016

My response to a sibling question gives some details on recent developments. We're currently looking to target 28nm bulk CMOS.

0xcde4c3db · on Jan 3, 2016

Microsemi (the names "Actel" and "ProASIC" might be better-known) and Lattice are also mentioned as sponsors in the article itself. These are FPGA vendors, which suggests that they might be interested in shipping RISC-V soft cores in their design tools or possibly even shipping chips with RISC-V hard cores (similar to the Xilinx Zynq and Altera SoC families).

Notably, Lattice manufactures the only FPGA for which a full open source design flow currently exists (albeit unsupported by Lattice themselves) [1], and has its own set of open source soft processor cores [2] [3].

[1] http://www.clifford.at/icestorm/

[2] http://www.latticesemi.com/en/Products/DesignSoftwareAndIP/I...

[3] http://www.latticesemi.com/en/Products/DesignSoftwareAndIP/I...

petra · on Jan 3, 2016

I think the main reason the fpga companies aren't offering a low-cost , hard core mcu+fpga chip is that they are afraid about competing against the mcu companies.They can certainly reach the prices (Lattice claims a 50 cents fpga).And it's very affordable to license from ARM.

So i don't see this sector changing with an open-source core.

sitkack · on Jan 4, 2016

As sanddancer said, there are already high end parts from both of the top FPGA companies that include hard MCUs. Xilinx has been doing this for a really long time going back to embedded PPC. They wouldn't be competing against MCU companies, they are already competing against other programmable logic companies with the same features. The difficult part is the engineering and marketing in a way that doesn't cut into the profit margin of those low cost FPGAs. And RISCV provides a zero to low cost path to working netlists and toolchains. MCU+FPGA hybrids are absolutely the future in programmable logic it is only a matter of time before _all_ shipping fpgas have built in hard programmable control logic.

petra · on Jan 4, 2016

>> MCU+FPGA hybrids are absolutely the future

If so , it will probably come from china - they're building all the parts, are hungry(even at the state level) to achieve dominance , and don't care much about legacies, and have the market (probably backed by government) to support such strategy.

Sanddancer · on Jan 4, 2016

Altera, and Xilinx both offer FPGAs that have ARM cores in them for when the need arises. However, the FPGA market has entrenched itself fairly well into the DSP realm, and as such, worrying about an mpu at all just means chip area that would likely go to waste anyways.

Lattice on the other hand, is a more general purpose chip maker, so they may not want to bite the hand that feeds them. They've got a lot of products other than FPGAs, so keeping those lines working is probably more strategic than anything else.

JoachimSchipper · on Jan 5, 2016

MicroSemi offers FPGA+ARM in their SmartFusion chips. About RISC-V: consider asking them.

mwcampbell · on Jan 3, 2016

I wonder what Google, HP, and Oracle are planning to do with RISC-V. Will RISC-V-based chips be able to compete with Intel in the server market?

Incidentally, I think the first consumer devices to adopt RISC-V will be home wireless routers, because the stock firmwares for those are closed, so they'll just need to be recompiled with a new toolchain. And those devices don't need a GPU.

zhemao · on Jan 3, 2016

We do expect embedded systems to be the main use of RISC-V processors at the outset. In fact, I believe there is a consumer product out now - some kind of camera - which uses a RISC-V processor.

I don't think we'll be able to compete with Intel's x86 chips for a while. Mostly because Intel's silicon processes are more advanced than those offered by other foundries.

aurhum · on Jan 3, 2016

The AXIOM, a 4K camera uses a RISC-V

http://riscv.org/workshop-jan2016.html

asb · on Jan 3, 2016

I agree that the easiest entry-point for RISC-V is in systems which run embedded firmware of some sort (even if it's a builroot Linux rootfs). Even before you get to that, there are loads of companies that still have their own home-grown instruction sets used for auxilliary functions on a larger core. Think peripherals on an SoC, SSD controllers and so on.

mtgx · on Jan 3, 2016

Well, it may be easier to offer an open source RISC-V processor in products, but it's certainly not necessary.

From Wikipedia:

> The RISC-V authors aim to provide several freely available CPU designs, under a BSD license. This license allows derivative works such as RISC-V chip designs to be either open and free like RISC-V itself, or closed and proprietary, (unlike the available OpenRISC cores, which under the GPL, requires that all derivative works also be open and free).

https://en.wikipedia.org/wiki/RISC-V

So if it's not GPL, such as OpenRISC, then it's not guaranteed to be fully open source. You could still have 99% of the chip as open source and the other 1% as the proprietary backdoor.

detaro · on Jan 3, 2016

Is it in any way established what GPL means for hardware based on GPL-ed HDL code?

duskwuff · on Jan 3, 2016

It's not even clear what certain parts of the GPL mean when applied to interpreted languages, like Javascript. Applying the GPL to more exotic languages like HDLs is a legal black hole; I don't think even the FSF is entirely sure how that would play out.

(Which, of course, means that no sane manufacturer would touch GPLed hardware designs.)

bitwize · on Jan 3, 2016

I wonder what the copyright license terms on Oracle's proprietary ISA extensions will be...

thebeardisred · on Jan 3, 2016

Obligatory: ¨RISC architecture is going to change everything¨ - https://arikia.files.wordpress.com/2013/02/test2.gif?w=625

fmarch · on Jan 4, 2016

Has any one created a verification infrastructure with any of RISC-V implementations? The git repository does have a set of tests and benchmarks but could not find any thing more than that.

gluggymug · on Jan 4, 2016

Check out : https://github.com/ucb-bar/rocket-chip

Last time I complained here about the verification infrastructure of this project, someone gave that link.

I didn't like much of what I could find there.

What exactly are you trying to do?

fmarch · on Jan 4, 2016

I want to at the minimum compare the architectural state of the implementation on every retire with the ISA simulator (Spike).

In addition it would be good to have a set of assertions to maintain the sanity (read functional correctness) while experimenting with the design.

How are these implementations verified right now ? All I see are few assertions in chisel and small set of tests.

_chris_ · on Jan 4, 2016

You can get a commit log (PC, inst, write-back address, write-back data) from Rocket to compare against Spike's commit log. It's not documented because the verification story is still in flux, and the commit log is fairly manual (since there will be many false positives).

Comparing a real CPU against an ISA simulator is VERY HARD. There's counter instructions, there's interrupts, timers will differ, multi-core will exhibit different (correct) answers, Rocket has out-of-order write-back+early commit, floating point registers are 65-bit recoded values, some (required) ambiguity in the spec that can't be reconciled easily (e.g., storing single-precision FP values using FSD puts undefined values in memory, the only requirement being that the value is properly restored by a corresponding FLD).

We also use a torture tester that we'll open source Soon (tm).

fmarch · on Jan 5, 2016

Thanks for the information. I understand its a hard problem but an essential one that needs a solution. It needs to be supported both in Chisel and with appropriate infrastructure to test and compare with the golden ISA model. Is there any one in the community who is actively working on the verification story ?

_chris_ · on Jan 6, 2016

> Is there any one in the community who is actively working on the verification story ?

Not sure. I'd look out for videos to show up at the RISC-V workshop that's ongoing (http://riscv.org/workshop-jan2016.html).

The problem is verification is where the $$$ is, so even amongst people sharing their CPU source code, they're less willing to share the true value-provided of their efforts. A debug spec is being developed and will be added to Rocket-chip to make this problem easier.

With that said, MIT gave a good talk at the RISC-V workshop about their works on verification, and we open sourced our torture tester at (http://riscv.org/workshop-jan2016.html).

_chris_ · on Jan 7, 2016

https://github.com/ucb-bar/riscv-torture, stupid copy/paste.

gluggymug · on Jan 4, 2016

Not a lot of verification that I could find.

The tests are not extensive. Just hand written assembly code testing one thing at a time AFAIK. As someone who used to lead ASIC verification projects for a living, I expected a lot more at a minimum.

I don't know what the Chisel stuff checks. Chisel doesn't do anything at the simulation stage I believe.

I guess if you are just playing around with CPU designs, you can use this stuff.

I would never sign off on going to tape out with just this stuff though. Apparently they've taped out over 11 times though!

You want to run the RTL and compare against the ISA simulator? I think you're on your own...

tbirdz · on Jan 3, 2016

How does RISC-V compare with the J2 core based on the SuperH architecture by the Open Processor Foundation[1]?

1: http://0pf.org/

asb · on Jan 4, 2016

I answered a similar (but slightly different) question here: https://lobste.rs/s/jmgsyl/untethered_lowrisc_release/commen..., comparing lowRISC and the Open Processor Foundation.

wstrange · on Jan 3, 2016

Can someone explain the advantage of RISC-V vs. other open source architectures (e.g. OpenSPARC?).

Are there legal issues that make it more favourable?

FullyFunctional · on Jan 4, 2016

Lots have been written about that, but this might be a good starting point: http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-14..., but read also the spec (http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-62...) and pay attention to the footnotes.

IMO, compared to OpenSPARC, this is a fresh design that leverages lessons learned from everything that proceeded it. There are many problems with SPARC that makes it hard & expensive to scale up and down.

tw04 · on Jan 3, 2016

Well, I think the fact it's LGPL vs. GPLv2 is one point in favor of RISC-V. The other major factor I believe is that the RISC-V architecture is better suited for embedded type applications. I don't think you could (easily) scale down the OpenSPARC design enough to make it a suitable architecture.

zhemao · on Jan 3, 2016

The RISC-V ISA and Berkeley's open source reference implementations are BSD-licensed, not LGPL.

And yes, the RISC-V ISA is simpler than the SPARC ISA. We also have clearly separated the base integer ISA from the various extensions, such as floating-point, atomics, supervisor, etc.

tw04 · on Jan 4, 2016

Thank you for the correction, I was thinking OpenRISC (which is LGPL), vs. RISC-V (which is BSD).

jkot · on Jan 3, 2016

I believe OpenSPARC is a bit slow, this could be faster.

chei0aiV · on Jan 4, 2016

I wonder if HP are looking at RISC-V/lowRISC for their "The Machine" project.

Const-me · on Jan 3, 2016

I wonder how are they going to compete with Intel?

I don’t see how they’re going to match raw performance against those 1tflops+ 20-cores xeons. I don’t see how they’re going to match performance/watt against 7w quad-core C2338. I don’t see how they’re going to match price against $20 x5-Z8300. If none of the above, then why should Google/HP/Oracle/anyone else buy that?

jeffbush · on Jan 4, 2016

I don't think they need to compete with Intel.

The numbers I've seen suggests Intel ships around 300-400 Million processors per year[1]. In 2014, 12 Billon (with a B) devices with ARM cores shipped[2]. In 2011, around 500 Million MIPS based cores shipped[3].

These are obviously very different businesses. But it's clear there is an enormous market for licensable low-end IP cores that are used in everything from cars, cable boxes, TVs, cellphones (many phones have multiple cores internally to operate functions like the cellular baseband), etc. Most of the licensees are just gluing together licensed cores and contracting foundries like TSMC or Global Foundries to build them. The margins tend to be pretty thin (we're talking chips that are in the single digits for cost), so saving a little money on IP licensing is attractive. Since these are often custom applications, binary compatibility is less of an issue.

That seems like a more interesting market for an open ISA and open core processors.

[1] http://www.ft.com/cms/s/0/beff6e56-53ed-11e4-80db-00144feab7... [2] https://atlas.qz.com/charts/Ek18VmbP [3] http://www.forbes.com/forbes/2011/0509/technology-mips-sande...

Const-me · on Jan 5, 2016

Thanks, that sounds reasonable.

mtgx · on Jan 3, 2016

Maybe they think the more Intel conquers the markets it's in, the less competitive it will be performance/price-wise - like say how it replaced its $110 Core-based Celeron chips for Atom chips with half the performance and almost no extra "features" that also cost $110, because it knows people will keep buying "Intel" in the PC market and are not even giving AMD a second look now. It surely helped when Microsoft killed Windows RT and eliminated the ARM competition from the market for the foreseeable future.

Maybe these RISC-V chips will never get out of the labs of Google, but they could still serve to force Intel to keep prices the same on its new generations in the future, unless they want Google to really get serious about making its own chips.

Const-me · on Jan 5, 2016

“how it replaced its $110 Core-based Celeron chips for Atom chips” — they already reverted that decision. Latest Celerons 3855U, 3955U are Skylake i.e. 6th generation Core chips.

nickpsecurity · on Jan 4, 2016

Counter subversion concerns of various black boxes for SaaS operations.

Have customizable cores to add HW accelerators to like Cavium's Octeon III's do.

Things might get cheaper over time as big vendors buy in volume and that money goes back into enhancements.

Const-me · on Jan 5, 2016

Vendors won’t be buying from RISC-V, they’ll be buying from someone else who’ll use that open source design. By not paying for R&D and enhancements, that OEMs will be able to offer lower prices.

nickpsecurity · on Jan 5, 2016

You might be replying to the wrong comment by mistake as I listed a number of reasons people might like an open ISA and reference implementation. That OEM's will repackage it for those purposes doesn't contradict what I said.

igmor · on Jan 4, 2016

they will be competing with Apple's ARM product