Hacker News new | past | comments | ask | show | jobs | submit login
SiFive U8-Series Core IP (sifive.com)
96 points by ingve on Oct 24, 2019 | hide | past | favorite | 26 comments



This is a press release. There is a blog post going a bit more technical here: https://www.sifive.com/blog/incredibly-scalable-high-perform...


The floating point unit is optional on the U8, which is a non standard choice for a OoO design. Normally if you've got an OoO frontend, sticking a FPU is a drop in the ocean (or can be).

Is this an artifact of them using BOOM (which is just an assumption on my part), since BOOM makes it ridiculously easy to add or remove backend execution units?


There are an absolute ton of customization options available. In the context of the decisions that can be made, making FP optional isn't surprising.

I do have to wonder about whether they've got tools for real customers that can effectively test the tradeoffs being made, at least for performance options like cache size & associativity. Even with all the options at your disposal and an intimate knowledge of your problem domain, beating a handful of well made and broadly tested off the shelf designs won't be easy.

I love this stuff, but I'm a bit skeptical if it's actually something the market is asking for. If there were more substantial options like crypto accelerators, GPGPU, video encoding/decoding, FPGA, mixing big and lots of small cores, vector extensions, being able to size up the ALU for specific operations... I could see the pain of custom silicon paying off.

Having accelerators sharing the cache hierarchy is not something the big players have really done much of yet and probably won't until there's so much dark silicon they can more or less throw the kitchen sink in.

https://scs.sifive.com/core-designer/customize/f9c3a7e1-08b7...


One benefit of having customers customize their own cores, is they can iterate their settings (on FPGA) in hours and test the result on their actual workload, not just guess how some semi-standard core might perform on their workload based on SPEC or CoreMark or something.

The announcement says the U87 with (Cray-like) vector processing will be available in H2 2020.(Also ARM SVE-like, but ARM hasn't shipped anything with it yet, and I think hasn't even announced anything with it? Fujitsu has, in a supercomputer) Meantime, the U84 has only scalar integer and FP, with nothing equivalent to NEON on ARM. That might or might not matter to you.

Mixing a reasonable number of cores (up to nine) in the automated SoC configuration tool was announced alongside the 7-series (dual issue in order) cores in October last year. That continues. If you want more cores than that some engineering time would be required to verify it.

https://sifive.cdn.prismic.io/sifive%2F5ec09861-351b-420c-b6...

Adding QuickLogic eFPGA to a SiFive core has been announced:

https://www.prnewswire.com/news-releases/quicklogic-teams-wi...


Very cool, thank you!

While you can iterate pretty quickly on an FPGA, I'd still be worried that cutting down a chip to optimize for $/perf works until it doesn't. Something off the shelf would be pretty resilient to algorithm tweaks (ad absurdum you have an Intel chip which performs black magic to run almost anything you throw at it well).

It's pretty hard to overcome the economy of scale of just buying a mass produced chip that's twice what you need but ten times cheaper because they're made in batches of a million.

I should note that my skepticism is coming from a place of "I want this to be a thing". Using the web page to design a chip is an instance of "live in the future and build what's missing" if I've ever seen one.


This is not to beat commodity processor/controller silicon. If you are going to build a SoC for whatever reason (peripherals, integration, etc) -- you might as well make the integrated processing exactly what you want.


An article on golem.de cites (Google translated) "According to Sifive, the U84 supports a heterogeneous core complex: four U84, four U74 and one S2 core can be combined."

The U84 is OoO comparable to Cortex A72. The U74 is comparable to A55. The S2 is comparable to Cortex M0/M3/M4 (depending on configuration) but with 64 bit registers and addressing.

https://scr3.golem.de/screenshots/1910/SiFive-U84-U87/thumb6...

https://www.golem.de/news/risc-v-sifives-u87-kern-nutzt-vect...


Depending on your use-case, a worst case execution time (WCET) analysis can yield these numbers. These are usually deployed for analysis of safety critical systems (e.g. for ECU in cars or industrial controllers), but it could be (ab-)used for that case as well. Of course the analysis needs to support both your ISA (RISC-V) and the exact core implementing it, plus adjusting the parameters; this isn't exactly cheap, since it requires a lot of very specific know-how, and I wouldn't suspect SiFive will produce something like this (Disclaimer: I work in the field).


I don't think U8 is BOOM-based, but RISC-V decoders are pretty trivial, so they probably had some extra time and patience to make the rest of the frontend more flexiblep


It certainly reads like it's based on BOOM; they're describing BOOM's gate count niche to a tee. And their previous designs were based on rocket.


So, is the U8 core based on BOOM, or is it a ground up design? Or if neither, then what is it?

And the HBM2E+ is some doohickey so customers can combine a RISC-V core with HBM2 memory?

(and if somebody from SiFive happens to read this, there's a typo, presumably HMB2E+ means HBM2E+)


Fixed now, thanks.


I still see HMB2E+.


Really fixed :-)


How can it support the vector extensions? I didnt think the spec was done yet. Is it?


The U84, available to lead customers now, doesn't have the vector extensions. The U87, announced to be made available in H2 2020, will.

The vector extension spec is 99% done. There shouldn't be anything more than minor tweaks from now.

Note that part of the RISC-V extensions philosophy is that specs are not finalized until (preferably) multiple implementations have been made and shown to work as expected in the real world.


Is there another implementation of vector extension?


AFAIU one of the nice thing of the SiFive IP cores is that they were distributed with some kind of free license (although they still required other non-free IPs to be actually used in a chip). Is this still (or was it ever) true?


AIUI, the U5 series is BSD/open-sourced as "Rocket", and a version of their shared L2 cache is available as the sifive-inclusive cache. I believe Berkeley uses these artifacts as part of their FireSim (https://fires.im) FPGA-based infrastructure.


Curious what about this technology makes it front page noteworthy? Has the deep learning community been hotly anticipating the arrival of this technology, or is it more along the lines of stuffing the ballot box?


It's the announcement of the highest perf RISC-V core that you'll be able to put your hands on so far.


Any stats for that? Being the highest means there are other lower ones. What are other RISC-V players?


Of what you can buy today: Kendryte K210, GigaDevice GD32V, and SiFive has FE310 and U54MC.


"you'll be able to put your hands on" ... the million dollar question: When???


If you're a company making your own SoCs already and want a high performance RISC-V core for a mid-range tablet or phone or an SBC equivalent to the Raspberry Pi 4 or something like that then the answer is "now".

If you're someone wanting to buy a quantity one chip from Digikey or Mouser then it'll be ... when someone decides to make and sell a retail chip. Which might well be someone making their own chip for a tablet or set-top box etc and selling the surplus production.


For sure. SiFive has been pretty good at delivering so far.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: