I looked over the Chisel source code and the barebones datasheet [1]. This chip ...

rwmj · on Nov 30, 2016

I'm worried this leads to the zoo of random hardware that we see on ARM, which makes supporting Linux distros on ARM such a PITA.

It would be better if the basic hardware — serial ports and such — was in a standard location for all RISC-V machines, and all the rest of the hardware was discoverable at runtime (like PCs, mostly).

rwmj · on Nov 30, 2016

The parent comment has proven oddly controversial going down to 0 and up to +5 and back down again. So let me try to do better and clarify what I mean. For context, I am maintaining Fedora on RISC-V: https://fedoraproject.org/wiki/Architectures/RISC-V .

On a PC, there is a serial port at a standard location. It's not "discoverable", but every PC you care about has one at the same address, and it's incredibly easy to use with a tiny bit of assembler. You can always print messages, even in the earliest parts of early boot.

On non-PC platforms (I've used ARMv7, ARMv8, POWER) there's a zoo of serial ports. There are at least a half dozen different bits of hardware, on different ports, self-describing, not discoverable at all, or mentioned in device tree in multiple different ways. Some even require PCI so needing ugly hacks to output early boot messages.

Critically, if you get the wrong serial port, you cannot see any boot messages at all. There's no way to tell what's going wrong.

So I think, for serial ports, the PC approach is definitely, clearly better.

For other hardware, it should just be self-describing. PCI and USB are the model examples here. And in fact, why not just use those?

microcolonel · on Dec 2, 2016

I agree on the serial port thing. That's one device which I think should be architecture-standard. I can't imagine bringing up firmware or debugging a kernel without a low-level system serial port. It also makes remote management a lot easier, because the remote console can hook into that in a standard fashion.

aray · on Nov 30, 2016

It looks like we're moving towards a solution (in a slow, jerky, shambling way) with replacing hardcoded boardfiles with device trees.

If RISC-V does this well out of the gate (like with IBM's old school Open Firmware) and has great shiny device tree support, I think we might end up like x86 -- where a single thumbdrive can boot almost any x86 machine under the sun.

BuuQu9hu · on Nov 30, 2016

HiFive1 isn't something you would run a Linux distro on anyway, especially since the ISA is still evolving and it will be using an early version.

kristoffer · on Nov 30, 2016

Or you can solve it as done in GRLIB (www.gaisler.com/index.php/downloads/leongrlib), with having all on-chip peripheral information (memory & irq map, etc) specified in a ROM configuration area (which is automatically generated at synthesis time).

pawadu · on Nov 30, 2016

or we could continue using device trees, which are much more expressive and were actually designed for this purpose.

kristoffer · on Nov 30, 2016

Device trees needs to be written to match the hardware, compiled and supplied to the kernel. So not really as nice as "boot this kernel on any hardware".

Device trees are needed because manufacturers did not create self describing hardware.

pawadu · on Nov 30, 2016

> Device trees are needed because manufacturers did not create self describing hardware.

This is impossible to do beyond trivial components. Most devices are complex systems of interacting components.

Also, do you want the same lazy manufacturer that couldn't bother to create a device tree create a complete hardware description ROM and get it right in the first attempt?

rwmj · on Nov 30, 2016

In practice "self describing" doesn't mean the hardware completely describes everything about itself. It only needs to describe enough that the OS can load the right driver and the driver can locate the hardware address, interrupts and so on. After that the complexity resides in the driver itself. PCI has been doing this sort of thing successfully for two decades, so we know it's possible. ACPI has been doing the same thing for power management, clocks, power zones, suspend etc, again for something like two decades.

pawadu · on Dec 1, 2016

In theory, maybe. But have you seen the corresponding code in the linux kernel when they try to probe a partially known component and at the same time go around all kinds of bugs and errors and differences in silicon revisions? All those magic numbers and timeouts and dependencies that no one really can explain?

Configurations pages are a thing of the past, nowadays they should at best be used for confirmation and sanity checks. Device trees are the future and have already improved and simplified hardware management a lot (specially on ARM).

kristoffer · on Nov 30, 2016

So x86 has no non trivial components?

It would probably force the hw manufacturer to think through its design a little bit more, which would be a good thing. I've seen enough of these SoCs with "complex system of interacting components" to feel that a well thought out design that needs less static description of SoC/board/cpu level details would be beneficial.

ChuckMcM · on Nov 30, 2016

You can bit bang a lot with 320Mhz cycle time :-). That said I wonder if we'll see a common standard emerge for an 'open' serial protocol. Sort of PCIe lite which runs at say 250Mhz over a pair of LVDS pins that you make into an AHB bus master in the memory matrix. That would really open up the peripheral market.

trsohmers · on Nov 30, 2016

The frequency limit for bit banging is usually limited by the GPIO pins (well, their drivers) rather than the core clock. I use some ~250MHz PIC32MZ's with their GPIOs capping out at around 50MHz, and even the raspberry pi with a core clock of ~1GHz can only bit bang with its GPIOs up to ~60MHz without running into problems. As for LVDS, it takes up more area and power, and you usually need to license the IP block rather than it being available for free from the standard cell library provider.

H3g3m0n · on Dec 1, 2016

Is there really a need? Most serial protocols can really be considered 'open'.

For low bandwidth there is RS232/UART/JTAG. Then there is I²C/SPI for networked. I²C ranges from 100kbit/s to 2.3Mbit/s. And SPI goes to 10Mbps/30Mbps.

For higher bandwidth stuff there is whatever you want over USB, Bluetooth, 802.11BLAH Wifi. Etc... Afaik USB can be considered 'open', you might need to 'register' for a certification/vendor id/logo but from what I understand if you don't want those you don't need to bother. There are also http://pid.codes/ and other organisations that are giving away free pids.

There is the Wishbone bus. https://en.wikipedia.org/wiki/Wishbone_(computer_bus)

But that's got quite a few pins. It allows for on chip networking as well as external stuff. Also RapidIO (which has heaps of pins).

There is a RISC-V debugging standard, but I think that's protocol agnostic.

MagerValp · on Nov 30, 2016

My experience is mainly from 8-bit CPUs, but unless you can afford one IRQ per bit, won't you just be busy waiting at 320 MHz instead? Performance tends to be less of a problem than the fact that you can't do anything else at the same time.

TD-Linux · on Nov 30, 2016

Too late to edit, but I meant to link to this page, which is more specific about this chip's implementation: https://dev.sifive.com/documentation/freedom-e310g-0000-manu...

makomk · on Nov 30, 2016

The ESP8266 and ESP32 have a very similar memory mapped quad-SPI setup. The (older, cheaper, more limited) ESP8266 is probably the most similar in terms of peripheral set etc, though that obviously still has more RAM, WiFi, I2C and I2S which this lacks.

mattthebaker · on Nov 30, 2016

Regarding point 1, this is a terrible design decision. Any low power or high performance uC that requires code in external SPI loads it into and executes from RAM. At 32Mhz it starts to make sense to have an ICACHE, with EMBEDDED Flash on a parallel bus. At 320Mhz you want to cache your RAM. With 320Mhz and external SPI (not even fast parallel NOR), you have to be insane. A single instruction cache miss will cost you 100s of clock cycles for a SPI read.

I really hope they just forgot to mention 100kB+ of RAM on that landing page, and the 16kB data is just a DCACHE.

TD-Linux · on Nov 30, 2016

The limitation that I referred to (same as open-v) is that it's very difficult to get IP for embedded NVRAM, and requires more complicated processes. So this seems like the next best option. Yes, misses are going to be very expensive, but I'd rather have icache than manually paging code (plus you can use the 16KiB internal RAM for tight timing loops). More RAM would of course be better, but probably not the first thing I would add in the next revision (that would be more peripherals).

Also note that Quad SPI flash is 4 bits wide, and is NOR flash.

mattthebaker · on Nov 30, 2016

Yes, the process trade offs are not so good if you want embedded flash. You need a process that can withstand the high voltages for flash erase, which means thick oxides and slow transistors.

At 320Mhz, you should have icache and paged code. A process that can clock this high should have no problem with RAM densities to offer more than 16kB. Better to mirror what the entire industry does: low clock speed and embedded flash, or better process with faster clocks, more RAM and external flash.

Quad SPI NOR is still SPI, which requires serial timings. You need to serialize address and data with each transaction. Parallel NOR has parallel address and data, and an order of magnitude improved throughput and latency.

There is really no point to 320Mhz with such slow code memory.

TD-Linux · on Nov 30, 2016

Yeah, even with the Opus application I mentioned, 320Mhz is way overkill - 50Mhz would have been plenty. It kind of seems like for some reason they were able to get a modern process, but only a really tiny die. It's pretty interesting that Open-V made the same tradeoff - 160MHz, but only 8KiB of RAM shared between code and data (with no memory mapped SPI flash).

mwcampbell · on Nov 30, 2016

Would built-in I2S be required for an audio DAC add-on like the Teensy audio adapter [1] to be practical?

[1]: http://pjrc.com/store/teensy3_audio.html

TD-Linux · on Nov 30, 2016

Yeah, bit banging I2S would be gross. You could, however, use the PWM outputs for audio as long as the requirements aren't too strict. And of course, you could always implement it with extra hardware. Hoping for I2S on the next version :)

david-given · on Nov 30, 2016

This is the first time I've come across this concept (outside the ESP ecosystem); just to check I'm understanding this correctly, this means I can hook up a serial SRAM or equivalent device and the hardware will automatically demand-page arbitrary amounts of code out of it, right?

What's the throughput like? Can this be used for data as well (which will need caching too)?

Because this suddenly makes the device much more interesting; for everything I've done with microcontrollers (which, I'll admit, tends to be abusive), I would happily trade performance for some more RAM.

TickleSteve · on Nov 30, 2016

No, the QSPI device is simply memory mapped as in quite a lot (most) microcontrollers.

The external QSPI FLASH just appears in the normal memory map, no demand-paging.

Normally used for eXecute-In-Place code (XIP) when the code is too large for the internal FLASH.

There is a performance trade-off, and external QSPI FLASH might run its bus at (for example) 50MHz, but thats still much slower than internal FLASH directly attached to the bus.

david-given · on Nov 30, 2016

But SPI devices can't be memory mapped --- it's a serial protocol and they can't be attached to the bus. For this to work, something must be converting bus accesses into SPI requests. And then, at least assuming it's not doing an SPI read for every access, it needs to cache the result somewhere... but you've just said it's not doing demand paging?

I am now really confused.

TickleSteve · on Dec 5, 2016

There is a hardware block within the processor that converts memory-bus accesses into SPI accesses (for devices supporting the SPIFI standard for example). This makes the whole SPI device appear to be memory-mapped (for read accesses).

makomk · on Dec 1, 2016

It's got a standard two-way associative cache in front of the SPI interface, much like you might have between RAM and the external bus in a normal CPU. If availabkle, demand paging would be the next step after the data wasn't found there either - a slower, higher-level concept that's handled in software rather than hardware.