HiFive1: A RISC-V-based, Open-Source, Arduino-Compatible Development Kit

TD-Linux · on Nov 30, 2016

I looked over the Chisel source code and the barebones datasheet [1]. This chip has a couple of unique features:

1. It doesn't have any onboard NVRAM (same limitation as the open-v). However, it does have a directly memory mapped quad-SPI peripheral and icache, which is a great alternative and might be better for applications that require a large amount of data. Note that an icache would still be required even if it had onboard NVRAM, because of its speed. You could also make swappable game cartridges, for example.

2. It has enough RAM and speed to run Opus.

3. The rest of the peripheral set is pretty barebones, no analog like the open-v has. No I2C or I2S without bitbanging, either.

4. The boot ROM has peripheral information stored in it. You might be able to have one binary that boots on different types of cores using this.

[1] https://dev.sifive.com/documentation/freedom-e300-platform-b...

rwmj · on Nov 30, 2016

I'm worried this leads to the zoo of random hardware that we see on ARM, which makes supporting Linux distros on ARM such a PITA.

It would be better if the basic hardware — serial ports and such — was in a standard location for all RISC-V machines, and all the rest of the hardware was discoverable at runtime (like PCs, mostly).

rwmj · on Nov 30, 2016

The parent comment has proven oddly controversial going down to 0 and up to +5 and back down again. So let me try to do better and clarify what I mean. For context, I am maintaining Fedora on RISC-V: https://fedoraproject.org/wiki/Architectures/RISC-V .

On a PC, there is a serial port at a standard location. It's not "discoverable", but every PC you care about has one at the same address, and it's incredibly easy to use with a tiny bit of assembler. You can always print messages, even in the earliest parts of early boot.

On non-PC platforms (I've used ARMv7, ARMv8, POWER) there's a zoo of serial ports. There are at least a half dozen different bits of hardware, on different ports, self-describing, not discoverable at all, or mentioned in device tree in multiple different ways. Some even require PCI so needing ugly hacks to output early boot messages.

Critically, if you get the wrong serial port, you cannot see any boot messages at all. There's no way to tell what's going wrong.

So I think, for serial ports, the PC approach is definitely, clearly better.

For other hardware, it should just be self-describing. PCI and USB are the model examples here. And in fact, why not just use those?

microcolonel · on Dec 2, 2016

I agree on the serial port thing. That's one device which I think should be architecture-standard. I can't imagine bringing up firmware or debugging a kernel without a low-level system serial port. It also makes remote management a lot easier, because the remote console can hook into that in a standard fashion.

aray · on Nov 30, 2016

It looks like we're moving towards a solution (in a slow, jerky, shambling way) with replacing hardcoded boardfiles with device trees.

If RISC-V does this well out of the gate (like with IBM's old school Open Firmware) and has great shiny device tree support, I think we might end up like x86 -- where a single thumbdrive can boot almost any x86 machine under the sun.

BuuQu9hu · on Nov 30, 2016

HiFive1 isn't something you would run a Linux distro on anyway, especially since the ISA is still evolving and it will be using an early version.

kristoffer · on Nov 30, 2016

Or you can solve it as done in GRLIB (www.gaisler.com/index.php/downloads/leongrlib), with having all on-chip peripheral information (memory & irq map, etc) specified in a ROM configuration area (which is automatically generated at synthesis time).

pawadu · on Nov 30, 2016

or we could continue using device trees, which are much more expressive and were actually designed for this purpose.

kristoffer · on Nov 30, 2016

Device trees needs to be written to match the hardware, compiled and supplied to the kernel. So not really as nice as "boot this kernel on any hardware".

Device trees are needed because manufacturers did not create self describing hardware.

pawadu · on Nov 30, 2016

> Device trees are needed because manufacturers did not create self describing hardware.

This is impossible to do beyond trivial components. Most devices are complex systems of interacting components.

Also, do you want the same lazy manufacturer that couldn't bother to create a device tree create a complete hardware description ROM and get it right in the first attempt?

rwmj · on Nov 30, 2016

In practice "self describing" doesn't mean the hardware completely describes everything about itself. It only needs to describe enough that the OS can load the right driver and the driver can locate the hardware address, interrupts and so on. After that the complexity resides in the driver itself. PCI has been doing this sort of thing successfully for two decades, so we know it's possible. ACPI has been doing the same thing for power management, clocks, power zones, suspend etc, again for something like two decades.

pawadu · on Dec 1, 2016

In theory, maybe. But have you seen the corresponding code in the linux kernel when they try to probe a partially known component and at the same time go around all kinds of bugs and errors and differences in silicon revisions? All those magic numbers and timeouts and dependencies that no one really can explain?

Configurations pages are a thing of the past, nowadays they should at best be used for confirmation and sanity checks. Device trees are the future and have already improved and simplified hardware management a lot (specially on ARM).

kristoffer · on Nov 30, 2016

So x86 has no non trivial components?

It would probably force the hw manufacturer to think through its design a little bit more, which would be a good thing. I've seen enough of these SoCs with "complex system of interacting components" to feel that a well thought out design that needs less static description of SoC/board/cpu level details would be beneficial.

ChuckMcM · on Nov 30, 2016

You can bit bang a lot with 320Mhz cycle time :-). That said I wonder if we'll see a common standard emerge for an 'open' serial protocol. Sort of PCIe lite which runs at say 250Mhz over a pair of LVDS pins that you make into an AHB bus master in the memory matrix. That would really open up the peripheral market.

trsohmers · on Nov 30, 2016

The frequency limit for bit banging is usually limited by the GPIO pins (well, their drivers) rather than the core clock. I use some ~250MHz PIC32MZ's with their GPIOs capping out at around 50MHz, and even the raspberry pi with a core clock of ~1GHz can only bit bang with its GPIOs up to ~60MHz without running into problems. As for LVDS, it takes up more area and power, and you usually need to license the IP block rather than it being available for free from the standard cell library provider.

H3g3m0n · on Dec 1, 2016

Is there really a need? Most serial protocols can really be considered 'open'.

For low bandwidth there is RS232/UART/JTAG. Then there is I²C/SPI for networked. I²C ranges from 100kbit/s to 2.3Mbit/s. And SPI goes to 10Mbps/30Mbps.

For higher bandwidth stuff there is whatever you want over USB, Bluetooth, 802.11BLAH Wifi. Etc... Afaik USB can be considered 'open', you might need to 'register' for a certification/vendor id/logo but from what I understand if you don't want those you don't need to bother. There are also http://pid.codes/ and other organisations that are giving away free pids.

There is the Wishbone bus. https://en.wikipedia.org/wiki/Wishbone_(computer_bus)

But that's got quite a few pins. It allows for on chip networking as well as external stuff. Also RapidIO (which has heaps of pins).

There is a RISC-V debugging standard, but I think that's protocol agnostic.

MagerValp · on Nov 30, 2016

My experience is mainly from 8-bit CPUs, but unless you can afford one IRQ per bit, won't you just be busy waiting at 320 MHz instead? Performance tends to be less of a problem than the fact that you can't do anything else at the same time.

TD-Linux · on Nov 30, 2016

Too late to edit, but I meant to link to this page, which is more specific about this chip's implementation: https://dev.sifive.com/documentation/freedom-e310g-0000-manu...

makomk · on Nov 30, 2016

The ESP8266 and ESP32 have a very similar memory mapped quad-SPI setup. The (older, cheaper, more limited) ESP8266 is probably the most similar in terms of peripheral set etc, though that obviously still has more RAM, WiFi, I2C and I2S which this lacks.

mattthebaker · on Nov 30, 2016

Regarding point 1, this is a terrible design decision. Any low power or high performance uC that requires code in external SPI loads it into and executes from RAM. At 32Mhz it starts to make sense to have an ICACHE, with EMBEDDED Flash on a parallel bus. At 320Mhz you want to cache your RAM. With 320Mhz and external SPI (not even fast parallel NOR), you have to be insane. A single instruction cache miss will cost you 100s of clock cycles for a SPI read.

I really hope they just forgot to mention 100kB+ of RAM on that landing page, and the 16kB data is just a DCACHE.

TD-Linux · on Nov 30, 2016

The limitation that I referred to (same as open-v) is that it's very difficult to get IP for embedded NVRAM, and requires more complicated processes. So this seems like the next best option. Yes, misses are going to be very expensive, but I'd rather have icache than manually paging code (plus you can use the 16KiB internal RAM for tight timing loops). More RAM would of course be better, but probably not the first thing I would add in the next revision (that would be more peripherals).

Also note that Quad SPI flash is 4 bits wide, and is NOR flash.

mattthebaker · on Nov 30, 2016

Yes, the process trade offs are not so good if you want embedded flash. You need a process that can withstand the high voltages for flash erase, which means thick oxides and slow transistors.

At 320Mhz, you should have icache and paged code. A process that can clock this high should have no problem with RAM densities to offer more than 16kB. Better to mirror what the entire industry does: low clock speed and embedded flash, or better process with faster clocks, more RAM and external flash.

Quad SPI NOR is still SPI, which requires serial timings. You need to serialize address and data with each transaction. Parallel NOR has parallel address and data, and an order of magnitude improved throughput and latency.

There is really no point to 320Mhz with such slow code memory.

TD-Linux · on Nov 30, 2016

Yeah, even with the Opus application I mentioned, 320Mhz is way overkill - 50Mhz would have been plenty. It kind of seems like for some reason they were able to get a modern process, but only a really tiny die. It's pretty interesting that Open-V made the same tradeoff - 160MHz, but only 8KiB of RAM shared between code and data (with no memory mapped SPI flash).

mwcampbell · on Nov 30, 2016

Would built-in I2S be required for an audio DAC add-on like the Teensy audio adapter [1] to be practical?

[1]: http://pjrc.com/store/teensy3_audio.html

TD-Linux · on Nov 30, 2016

Yeah, bit banging I2S would be gross. You could, however, use the PWM outputs for audio as long as the requirements aren't too strict. And of course, you could always implement it with extra hardware. Hoping for I2S on the next version :)

david-given · on Nov 30, 2016

This is the first time I've come across this concept (outside the ESP ecosystem); just to check I'm understanding this correctly, this means I can hook up a serial SRAM or equivalent device and the hardware will automatically demand-page arbitrary amounts of code out of it, right?

What's the throughput like? Can this be used for data as well (which will need caching too)?

Because this suddenly makes the device much more interesting; for everything I've done with microcontrollers (which, I'll admit, tends to be abusive), I would happily trade performance for some more RAM.

TickleSteve · on Nov 30, 2016

No, the QSPI device is simply memory mapped as in quite a lot (most) microcontrollers.

The external QSPI FLASH just appears in the normal memory map, no demand-paging.

Normally used for eXecute-In-Place code (XIP) when the code is too large for the internal FLASH.

There is a performance trade-off, and external QSPI FLASH might run its bus at (for example) 50MHz, but thats still much slower than internal FLASH directly attached to the bus.

david-given · on Nov 30, 2016

But SPI devices can't be memory mapped --- it's a serial protocol and they can't be attached to the bus. For this to work, something must be converting bus accesses into SPI requests. And then, at least assuming it's not doing an SPI read for every access, it needs to cache the result somewhere... but you've just said it's not doing demand paging?

I am now really confused.

TickleSteve · on Dec 5, 2016

There is a hardware block within the processor that converts memory-bus accesses into SPI accesses (for devices supporting the SPIFI standard for example). This makes the whole SPI device appear to be memory-mapped (for read accesses).

makomk · on Dec 1, 2016

It's got a standard two-way associative cache in front of the SPI interface, much like you might have between RAM and the external bus in a normal CPU. If availabkle, demand paging would be the next step after the data wasn't found there either - a slower, higher-level concept that's handled in software rather than hardware.

smilekzs · on Nov 30, 2016

320 MHz core clock is very impressive, compare to <= 200 MHz typical of Cortex-M3/4 impls. However at this freq, flash (instruction) readout becomes the bottleneck. While Cortex-M chips typically include some on-chip flash acceleration, this chip instead went for external flash + I-Cache. The new Cortex-M7 chip STM32F7 has both flash accel and I-Cache. It remains to be seen whether this chip can sustain real-world workloads at 320 MHz with 0 wait state. Even at 200 MHz it has the potential to replace proprietary fixed-point low-end DSP chips.

childintime · on Nov 30, 2016

Modern non-volatile RAM technologies (like XPoint) would be perfect for chips like this. Much faster than flash and usable as RAM.

ChuckMcM · on Nov 30, 2016

I like that 3.3V and 5V I/O is supported, so many Arduino shields assume 5V and so "compatible" boards like the ST Nucleo boards aren't actually compatible.

andkon · on Nov 29, 2016

For someone who has no idea what RISC-V is but loves making things with Arduino, what new things could I do with this?

rwmj · on Nov 29, 2016

RISC-V is an open source ISA: https://en.wikipedia.org/wiki/RISC-V

It has a number of implementations, both fully open source ones (BSD licensed), and proprietary. This one is based on the open source Rocket Chip implementation (https://github.com/ucb-bar/rocket), which is a simple in-order design, something like ARM Cortex-M or Intel Atom. (There is also one open source out-of-order design called BOOM - https://github.com/ucb-bar/riscv-boom)

fuzzythinker · on Nov 29, 2016

Agree, for some who has no idea what RISC-V is, which I'm guessing is more than 50% of visitors, the "why" on the page needs to sell me on why RISC-V is better or at least better for certain things.

wyager · on Nov 29, 2016

If you aren't interested in architecture, there's probably no particular reason for you to buy this at this point. RISC-V is just getting started, so there aren't huge practical benefits yet. In the long run, we'd obviously prefer to use open-source unencumbered hardware, which is the point of the RISC-V project.

cestith · on Nov 30, 2016

If RISC-V really takes off and you're a first mover writing the low-level stuff for it then the early mover advantage is a good investment.

wolfgke · on Nov 29, 2016

I rather believe that if you don't know what RISC-V is, you simply are not the audience.

kristianp · on Nov 30, 2016

Then you're not fulfilling your potential for new people to enter that audience: https://xkcd.com/1053/

BuuQu9hu · on Nov 30, 2016

https://youtu.be/QTYiH1Y5UV0

fuzzythinker · on Nov 30, 2016

Thank you, the video is very informative. Link should definitely be on the site.

56245623456 · on Nov 29, 2016

I think the RISC-V is way faster, otherwise you probably won't have any advantage, except knowing that the whole ISA is opensource in contrast to AVR chips.

iammyIP · on Nov 29, 2016

The RISC-V seems to have it's focus on being open and being simple, not so much on being the fastest. The specs are only 130 pages.

https://riscv.org/risc-v-foundation/

https://riscv.org/specifications/

monocasa · on Nov 30, 2016

It's a simple ISA, but built to be fast. It's specifically designed to map well to OoO cores at the slight expense of some features on in order cores that give you little boosts (like branch delay slots and other exposed pipeline features). Last I saw, BOOM compared very well to other OoO cores at the same gate count.

userbinator · on Nov 30, 2016

If anything I think RISC-V is repeating the same mistakes as MIPS and the original RISCs, by being far too simple and requiring much greater fetch bandwidth in the process.

The performance of MIPS, which it is closest to, has never really been considered anything more than "acceptable". It loses to ARM (which isn't so RISC-y anyway) and x86, so I expect RISC-V to be about the same:

https://www.extremetech.com/extreme/188396-the-final-isa-sho...

http://www.extremetech.com/wp-content/uploads/2014/08/Averag...

Compared to an 8-bit AVR in an Arduino it's definitely much faster, but compared to other 32-bit architectures, it is not.

monocasa · on Nov 30, 2016

They've made their compressed ISA a first class citizen, with code density that's in the Thumb2 range.

pawadu · on Nov 30, 2016

To be fair, the new 64-bit ARMs are becoming "more RISC-y" and have the same problems as MIPS.

dmitrygr · on Nov 29, 2016

It promises to maybe be faster. Has no internal flash, cannot be bought in volume if you intend to actually make anything, has a very immature toolchain, and currently is a theory only.

ajross · on Nov 29, 2016

These SiFive folks claim to have working silicon, there are what look like some power measurements on their site, and there are photographs of boards available for a claimed ship date in ~3 weeks.

It's not in my hands personally, but they seem to be over the vapor hump.

I found it an exciting enough product to drop $60 on, anyway.

mastax · on Nov 29, 2016

Your disclaimers are fair in that if you want an Arduino, buy an Arduino. But if a 320 MHz anything is slower than a 24 MHz AVR then the sky is falling.

ajross · on Nov 29, 2016

It's a much more capable device. The existing RISC-V instances (I don't know much about SiFive's offering per se, I'm just reading off their datasheet) are more comparable to ARM Cortex-M parts -- running in the dozens to hundreds of MHz, with an MMU and a real OS kernel (Linux, obviously, though surely there are BSD ports in the works).

Hardware has, in addition to the GPIO, UART and ADC/DAC that you're used to from Atmel, a USB 2.0 device controller (it says "OTG", which implies a host controller too, but sometimes people get this mixed up), a SD/MMC/eMMC controller for storage (no idea about SDIO) and gigabit ethernet.

I'm a little disappointed in the lack of wireless connectivity on the SoC, though I suppose you can make up for that with off the shelf USB devices (or a UART bluetooth radio).

TD-Linux · on Nov 29, 2016

The chip linked is the E310G, which neither has a MMU or real OS kernel (same as Cortex-M). It also doesn't have most of the peripherals you listed (maybe you got it confused with their other, larger chip). It really is a Arduino competitor (though much faster).

mSparks · on Nov 29, 2016

That sounds like these boards are getting to the point they can handle decent audio/signal throughput. But I don't see anything about usb or adc/dac on the site?

noselasd · on Nov 29, 2016

If you want an arduino compatible device that can do audio processing, go for a Teensy board (https://www.pjrc.com/teensy/ , http://www.pjrc.com/teensy/td_libs_Audio.html)

mSparks · on Nov 30, 2016

72/120MHz isn't really enough to do "decent" audio/signal processing with low distortion.

You can probably get it to work with a lot of effort and optimisation, but there'll be very little room to do anything with it afterwards.

Afaik AC97 chips run at around 24MHz, are dedicated to the job, and even they stop at 20 bit resolution and sticking the data onto a data bus. Getting better than that is "hard", which is why all the manufacturers pretty much standardised around such a low standard.

TD-Linux · on Nov 30, 2016

That speed is quite enough to do audio processing - you can run Opus easily on those microcontrollers. Many other audio processing tasks are simpler. AC97 chips are basically DACs/ADCs, and do no audio processing themselves, so are a bad comparison.

However, while most Cortex-M chips have an I2S peripheral to integrate with an external DAC, the HiFive1 doesn't, which might cause some difficulties. Audio out can be implemented with the PWM peripheral, though.

mSparks · on Nov 30, 2016

Depends what you mean by "enough".

24MHz is "enough" for 20 bits@96kHz ADC and some post processing.

But 20 bits@96kHz is not decent.

For reasonable SNR, you need at least 24 bits, and even then "the experts" offload to an external CPU http://www.tested.com/tech/pcs/454839-tested-why-high-end-pc...

For signal (less audio are more "controller") with high precision you need micro controllers with the power of at least an early 2000s PC (several hundred MHz and single cycle mul/div).

Raspberry Pi 3 is close, but it needs an external ADC/DAC.

TD-Linux · on Nov 30, 2016

Firstly, https://xiph.org/~xiphmont/demo/neil-young.html

Secondly, even if you did want to process at 96kHz, you'd have plenty of CPU left to do so. It's only 2x as intensive as 48kHz (this is a 32 bit CPU so using 16 bit vs 32 bit math is mostly the same, sans DSP instructions) and that amount of headroom is likely available, for example: https://www.rockbox.org/wiki/CodecPerformanceComparison

Thirdly, the article you linked talks about high end DACs but says nothing about the DSP on the card, other than that it has one, for doing... something (?)

mSparks · on Nov 30, 2016

Sigh,

Firstly, getting from uncompressed at the input to compressed is basic processing.

Seriously, you don't. Non risk instructions often take Multiple cycles, so you can only do a tiny number of them between samples, usually just enough to compress it to fit the bus speed without loss. https://en.m.wikipedia.org/wiki/Cycles_per_instruction

Thirdly, clearly you didn't rtfa.

Fourthly, go disagree on the teensy forum. https://forum.pjrc.com/threads/27364-Teensy-3-1-and-ADC-FIR-...

pjc50 · on Nov 30, 2016

Most sensible people stick to 16 bits @ 48kHz, especially if it's not a pro-audio device but just something with a MEMS mic and 8ohm speaker.

mSparks · on Nov 30, 2016

And they would be interested in signal processing a 50kHz LF radio transmission why?

mwcampbell · on Nov 30, 2016

The Teensy audio adapter [1] uses a dedicated DAC, the SGTL5000. That frees up the MCU to do more DSP.

[1]: http://pjrc.com/store/teensy3_audio.html

mSparks · on Dec 4, 2016

I should have been more specific.

Radio signals below 50 kHz are capable of penetrating ocean depths to approximately 200 metres, the longer the wavelength, the deeper. The British, German, Indian, Russian, Swedish, United States [3] and possibly other navies communicate with submarines on these frequencies.

->That requires min 100khz sample rate.

mwachs5 · on Nov 29, 2016

You are correct. This first FE310 chip on the HiFive1 board does not have a built in ADC/DAC.

cmrdporcupine · on Nov 29, 2016

320mhz clock speed but only 16KB of RAM.

RISC-V is appealing but if I'm stuck at 32KB or less of RAM I'd stick with the Parallax Propeller which has 8 parallel 100mhz cores.

cr0sh · on Nov 29, 2016

I'm not sure that's the whole story - specs read:

Memory: 16 KB Instruction Cache, 16 KB Data Scratchpad

I'm wondering if it's possible to combine that or dice it in some way? On top of that, the program resides in SPI flash (128 Mbit -> 16 meg).

EDIT: ok - the above makes no sense, so yes, on 16K on-board RAM (and the other is for cache).

Short of more info, I'd be willing to bet that some of that flash can be set aside (or used like) variable space (albeit at a slower speed), and the on-board memory is more for high-speed stuff (and you'd have to swap things in/out - though likely they'll have a library for all of that - maybe).

If all of that is true (or close to the truth) - well, I don't know if it would be better than the propeller or whatnot, but it certainly looks interesting...

EDIT:

Reading the infosheet on the processor:

https://dev.sifive.com/documentation/freedom-e310g-0000-manu...

It does seem like the flash can be used for data and program space - and it appears like it can be read/written to from the cpu - so it's kinda like the flash storage on the Arduino. I would imagine it can be used similar - although slower - as variable memory (given a proper lib); and "paged" into the faster on-board 16k RAM.

mwachs5 · on Nov 29, 2016

You're correct, you can access the 128MBit SPI Flash as any other read-only memory mapped memory -- you can execute out of it or load data (you can also write to it but need to use a seperate channel, it's not directly memory mapped to write).

You're also correct on the sizes of the ICache and Scratchpad. You can execute code which resides in the scratchpad, but can't store data in the I-Cache.

cmrdporcupine · on Nov 30, 2016

Does the MCU expose address lines to wire in external RAM?

Sanddancer · on Nov 30, 2016

No. The only external addressing is through QSPI0, and that only has one chip select line hooked to it. They do make small amounts -- 512k -- of SPI RAM, but that on its own would make the chip boot process interesting.

cestith · on Nov 30, 2016

You seem to be a more likely customer of their U500 platform, at over 1 GHz, 64 bit, cache-coherent multicore, DDR3/4 controller, USB 3.0, PCIe 3.0, and gigabit Ethernet. That one's also on a 28nm, process rather than 180nm. That should run Linux or an equivalent like the Raspberry Pi 3 or the Pine64 can. They don't seem to have an SBC ready for market with that chip yet.

This is their embedded / tinker / maker single-board computer based around their E300 platform with on-board SRAM. It's more like an Arduino or a Pi Zero.

wolfgke · on Nov 29, 2016

> Parallax Propeller which has 8 parallel 100mhz cores.

According to https://en.wikipedia.org/w/index.php?title=Parallax_Propelle... it is only up to 80 MHz.

cmrdporcupine · on Nov 30, 2016

Overclocks safely to 100mhz

i336_ · on Nov 30, 2016

Two points: a short question and a longer theory about why only 16KB.

The question: how viable is an open-source, publicly-auditable secure boot implementation? Not TPM theater or whatever, but a hardened hardware configuration that could be used to implement truly verifiable boot.

(If anyone from RISC-V is reading this, I think there is a very noteworthy amount of money in building a truly securely bootable reference design. Hopefully lots of people have already told you that.)

--

Next, regarding why only 16KB RAM...

I'm very interested in the potential of open source ISA, but admittedly completely ignorant about chip design.

With this in mind, I have a theory that the manufacturers deliberately provided a ridiculously low amount of RAM in order to make it impossible[1] to run Linux on it and so keep it out of the mass market.

Considering the CPU is full 320MHz I don't think this is because they have some other product up their sleeve. Rather, looking at the fact that this is the first marketed product they have available that's an actual real chip (!!), I would expect that either the chip itself and/or the chipset likely has bugs in it. They've gone through many internal revisions, and now they feel comfortable with putting the chipset out there for public testing to get bugreports from the field.

My thinking is that applications that will play nice with 16KB of RAM will stress the CPU out significantly less than full Linux would, similarly to how doing basic tasks on a faulty x86 PC may work for years but compiling GCC will find broken bits in RAM sooner or later.

There's also the fact that such projects are also generally quieter as a whole than the thundering herd of people wanting to run Linux on things.

Don't forget, RISC-V has been nothing more than a bunch of VHDL for years, running on "perfect" FPGAs that don't require you to think about low-level electrical niggles and whatnot. If I'm understanding correctly, this is the first time a real RISC-V chip run has been done (?) and made available to the market.

Considering the success of ARM (or, more generically, the market sector of "little PCBs that run Linux"), RISC-V needs to stay competitive and attractive - and full Linux that constantly oopses in mm.c will make RISC-V look real bad real fast. I definitely want the ISA to thrive, and if my assumptions are correct capping the RAM makes an inelegant-yet-elegant sort of sense.

I expect RISC-V will be running Linux on real fabbed chips within the next two years.

[1]: Well, practically impossible. If you don't mind 300KB/s RAM (yes, KB/s, not MB/s) there's always http://dmitry.gr/index.php?r=05.Projects&proj=07.%20Linux%20... :D

pjc50 · on Nov 30, 2016

> how viable is an open-source, publicly-auditable secure boot implementation?

Depends if you trust your fab. I suppose you can take random samples and decap them at considerable expense, but then the public has to trust the auditor.

> I think there is a very noteworthy amount of money in building a truly securely bootable reference design.

I disagree. I think there are two critical problems with this: firstly, getting the community to agree on what they consider "truly secure", and secondly getting enough people to buy a system that is (necessarily due to small production runs) slower and more expensive than comparable Intel or even ARM.

If you're willing to trust a manufacturer you can buy secure-bootable ARM devices today with OTP key regions that boot Linux (e.g. iMX). So the market for the proposed "open" system is only people who are willing to trust you but are paranoid enough to not trust one of the existing manufacturers.

i336_ · on Nov 30, 2016

All these points are very true; chains like these are unfortunately based on a root of implicit trust.

I make some counter-arguments:

Secure boot forms the trust basis that the device is definitively running the code you put on it without modification, so is arguably the most security-sensitive aspect of the system, in some ways more critically so than the kernel, network-facing daemons, etc. It is at least as important as those components.

I'm confident a publicly-auditable open-source secure boot implementation would attract fairly reasonable academic interest from the security field and be hacked on (from theoretical design down to implementational edge cases) by the community until it was very very good.

That would help avoid this sort of thing - http://www.cnx-software.com/2016/10/06/hacking-arm-trustzone... - which is currently only an issue because vendor engineering teams are not perfect and there's no widespread collaboration. (There is, of course, also the likely truth that there are similar "vulnerabilities" in all commercial secure boot implementations. Think TSA007.)

The one issue I will acknowledge is that if such a reference design existed and was widely implemented, it would be a very good question as to which manufacturers had "accidents" in the manufacturing process near the secure-boot areas of the chips.

If I understand correctly, the other very major issue (which is kind of really ironic considering what I've just said) is that you have to sign NDAs to understand how the implementation works, and AFAIK even to just use it. So I (a random tinkerer) can't configure a "really trustworthily secure" Linux system, only a major manufacturer/system integrator/etc can. I understand this situation but from the standpoint of paranoid individual security it's crazy - security by obscurity, anyone?

If it is possible for me to play with OTP on iMX from a hobbyist perspective without shelling out for some NDA'd SDK, I'm tentatively interested for what it's worth.

pjc50 · on Nov 30, 2016

imx53 is sort of possible to play with: https://cache.freescale.com/files/32bit/doc/ref_manual/iMX53...

Boot documentation is chapter 7. References "high assurance boot" functionality in the onboard boot ROM, but doesn't give out docs for the ROM. On the other hand, it's not a large ROM so you could just dump and reverse-engineer it.

i336_ · on Nov 30, 2016

Interesting... but I find it really amusing that my propositions are to reverse-engineer the secure boot system in order to take advantage of it, and also the fact that the reverse-engineering process won't be that hard. For what I assume is a security-by-obscurity design... reverse-engineering should by definition be really hard I would think, at least at face value.

pjc50 · on Nov 30, 2016

The whole of your second point is ridiculous conspiracy theory. It is perfectly reasonable to do enough formal verification to ensure that your chip works first time. Linux is not going to "stress" your chip more. And nobody deliberately cuts out a viable market sector unless they've got another product to put in it.

On-chip SRAM is just expensive in terms of area. It's comparable to what you get on Cortex-M devices. There are no chips with enough onboard SRAM to run Linux. You can't really run Linux without an external DRAM interface, and then you have to find somewhere to put the DRAM. People keep forgetting about this on the Pi because the DRAM is stuck on top of the SoC package.

Sure, maybe if this is a success they'll pay the licensing fees for a DDR3 interface + MMU, or write one themselves. I think you'd also want at least 600MHz for Linux, 320 is kind of slow these days.

i336_ · on Nov 30, 2016

> The whole of your second point is ridiculous conspiracy theory.

Theory, yes. Conspiracy theory, emphatically not, sorry for the misunderstanding. I have no ill will towards the RISC-V ISA and no disagreement with SiFive's operations. Like others here I was just trying to figure out the huge disparity between the CPU clock speed and the onboard memory, in my case with sorely insufficient understanding of the field.

> It is perfectly reasonable to do enough formal verification to ensure that your chip works first time.

Oh, okay then. That's really amazing, I didn't know that :)

> Linux is not going to "stress" your chip more.

Like I said, I'm a bit ignorant here. I was just thinking along the lines of how eg an i7 with faulty L2 cache could be re-designated as an i5, but in this case they're not sure what's faulty, etc.

There's also the fact that CPUs do have errata... (?)

> And nobody deliberately cuts out a viable market sector unless they've got another product to put in it.

Absolutely! What I was saying was that I theorize that this is a first-run design and that another chip was going to follow up. I'm increasingly confident I'm wrong about exactly why (eg, now I understand about the RAM problem)

> On-chip SRAM is just expensive in terms of area. It's comparable to what you get on Cortex-M devices. There are no chips with enough onboard SRAM to run Linux. You can't really run Linux without an external DRAM interface, and then you have to find somewhere to put the DRAM.

I see. Mmm :/

> People keep forgetting about this on the Pi because the DRAM is stuck on top of the SoC package.

Has a look at a picture oh wow so it is, that's amazing.

> Sure, maybe if this is a success they'll pay the licensing fees for a DDR3 interface + MMU, or write one themselves.

Hopefully this run is an incentivisation to get some support for that!

> I think you'd also want at least 600MHz for Linux, 320 is kind of slow these days.

Hmm, that's quite true, yeah. (It's kind of sad how speed-hungry the kernel is nowadays - I have a 32MHz PDA with an OS (EPOC, precursor of Symbian) that draws draggable windows on a little monochrome touchscreen LCD, I can fling the windows around faster than the crystals can update :D)

I would absolutely buy a 320MHz Linux device with an open-source secure boot story though. Text communication doesn't need a snappy CPU.

pjc50 · on Nov 30, 2016

> huge disparity between the CPU clock speed and the onboard memory

I think that's RISC "working as intended"; the tradeoff was always supposed to be that you got to issue lots of simple instructions at high speed. I can't find what manufacturing process they're using (?nanometers) but it sounds like it's simply a "why not?" outcome of the design process that the chip is very fast. It doesn't have all that many peripherals either, by modern standards.

Edit: the answer is here - https://news.ycombinator.com/item?id=13067833 - other chips that have onboard Flash are necessarily slower. This doesn't, so it can be faster.

SRAM just takes up a lot of space. Hard to tell without gate counts or die shots but the 16k+16k could easily be over half the die.

EPOC is one of those extraordinary things that can be called a great technological achievement with a tiny dedicated fandom that nonetheless became a dead-end. Like Amiga, Concorde, BBC Domesday Project, etc. I do wish we could have snappier GUIs on our ten-times-faster systems.

i336_ · on Nov 30, 2016

>> huge disparity between the CPU clock speed and the onboard memory

> I think that's RISC "working as intended"; the tradeoff was always supposed to be that you got to issue lots of simple instructions at high speed.

I see.

> I can't find what manufacturing process they're using (?nanometers) but it sounds like it's simply a "why not?" outcome of the design process that the chip is very fast.

Heh.

> It doesn't have all that many peripherals either, by modern standards.

This looks to me to have all the hallmarks of a first-gen MVP. A very decent offering likely with some long-term support, but an MVP nonetheless.

> Edit: the answer is here - https://news.ycombinator.com/item?id=13067833 - other chips that have onboard Flash are necessarily slower. This doesn't, so it can be faster.

I noticed that, it's a fascinating design tradeoff they picked.

> SRAM just takes up a lot of space. Hard to tell without gate counts or die shots but the 16k+16k could easily be over half the die.

Wow, TIL

> EPOC is one of those extraordinary things that can be called a great technological achievement with a tiny dedicated fandom that nonetheless became a dead-end. Like Amiga, Concorde, BBC Domesday Project, etc. I do wish we could have snappier GUIs on our ten-times-faster systems.

Mmm. I consider it insane that the Web is as slow as it is, but it makes a sad sort of sense. I've been wondering about making a cut-down general-purpose information rendering engine with a carefully-designed graphical feature set that's really easy to optimize. Would be really cool.

EPOC was awesome: some hand-wavy testing showed me that the OPL environment was fast enough to support full-screen haptic scrolling of information. It would have totally worked in just a few simple lines of code. If only full-panel capacitative touch were viable in '98 ;)

kobeya · on Nov 30, 2016

SiFive is working on two chips. This one is basically an atmel competitor. They are also working on a workstation class many core chip.

i336_ · on Nov 30, 2016

Oh, I see. That makes sense.

I'm definitely looking forward to seeing the many-core design!

Klasiaster · on Nov 29, 2016

So does it have a MMU? That would make it even more superior to Arduino boards.

mwachs5 · on Nov 29, 2016

The RISC-V ISA defines different privilege levels: Machine, Hypervisor, Supervisor, User. It is possible for a chip to conform to the RISC-V Privilege Spec v1.9 by only implementing the bare metal or Machine mode, which is what the chip on this board does. So the chip on this board does not have User mode or an MMU.

mwcampbell · on Nov 29, 2016

It seems to me that an MMU is no good if the board doesn't have enough RAM to run a general-purpose operating system, as opposed to a bare-metal program or a real-time OS like a microcontroller usually uses.

wolfgke · on Nov 29, 2016

> It seems to me that an MMU is no good if the board doesn't have enough RAM to run a general-purpose operating system, as opposed to a bare-metal program or a real-time OS like a microcontroller usually uses.

For a realtime operating system at least a Memory Protection Unit (to use the word that ARM introduced for this limited form of an MMU) is very useful since it can easily make the OS much more reliable if a process cannot write to memory adresses of the kernel or other processes.

EDIT: Or is such an MPU implied by the support for "Privileged ISA Specification v1.9.1"?

pjmlp · on Nov 30, 2016

> is very useful since it can easily make the OS much more reliable if a process cannot write to memory adresses of the kernel or other processes.

If you make use of memory safe systems programming languages on bare metal, like Ada, SPARK, Rust, Oberon-07 than it isn't usually an issue, since the unsafe code will be quite constrained.

For example, http://www.astrobe.com/boards.htm

56245623456 · on Nov 29, 2016

The website says it is conforming to "RISC-V Privileged ISA Specification, Version 1.9.1" which seems to specify a paging mechanism. I didn't read all of it though, so I may have overlooked an "optional" or something.

rwmj · on Nov 29, 2016

The most basic option for RISC-V is base + bound in M-mode which isn't paging and can barely be described as an "MMU". (Of course RISC-V also supports much more advanced options)