I looked over the Chisel source code and the barebones datasheet [1]. This chip has a couple of unique features:
1. It doesn't have any onboard NVRAM (same limitation as the open-v). However, it does have a directly memory mapped quad-SPI peripheral and icache, which is a great alternative and might be better for applications that require a large amount of data. Note that an icache would still be required even if it had onboard NVRAM, because of its speed. You could also make swappable game cartridges, for example.
2. It has enough RAM and speed to run Opus.
3. The rest of the peripheral set is pretty barebones, no analog like the open-v has. No I2C or I2S without bitbanging, either.
4. The boot ROM has peripheral information stored in it. You might be able to have one binary that boots on different types of cores using this.
I'm worried this leads to the zoo of random hardware that we see on ARM, which makes supporting Linux distros on ARM such a PITA.
It would be better if the basic hardware — serial ports and such — was in a standard location for all RISC-V machines, and all the rest of the hardware was discoverable at runtime (like PCs, mostly).
The parent comment has proven oddly controversial going down to 0 and up to +5 and back down again. So let me try to do better and clarify what I mean. For context, I am maintaining Fedora on RISC-V: https://fedoraproject.org/wiki/Architectures/RISC-V .
On a PC, there is a serial port at a standard location. It's not "discoverable", but every PC you care about has one at the same address, and it's incredibly easy to use with a tiny bit of assembler. You can always print messages, even in the earliest parts of early boot.
On non-PC platforms (I've used ARMv7, ARMv8, POWER) there's a zoo of serial ports. There are at least a half dozen different bits of hardware, on different ports, self-describing, not discoverable at all, or mentioned in device tree in multiple different ways. Some even require PCI so needing ugly hacks to output early boot messages.
Critically, if you get the wrong serial port, you cannot see any boot messages at all. There's no way to tell what's going wrong.
So I think, for serial ports, the PC approach is definitely, clearly better.
For other hardware, it should just be self-describing. PCI and USB are the model examples here. And in fact, why not just use those?
I agree on the serial port thing. That's one device which I think should be architecture-standard. I can't imagine bringing up firmware or debugging a kernel without a low-level system serial port. It also makes remote management a lot easier, because the remote console can hook into that in a standard fashion.
It looks like we're moving towards a solution (in a slow, jerky, shambling way) with replacing hardcoded boardfiles with device trees.
If RISC-V does this well out of the gate (like with IBM's old school Open Firmware) and has great shiny device tree support, I think we might end up like x86 -- where a single thumbdrive can boot almost any x86 machine under the sun.
Or you can solve it as done in GRLIB (www.gaisler.com/index.php/downloads/leongrlib), with having all on-chip peripheral information (memory & irq map, etc) specified in a ROM configuration area (which is automatically generated at synthesis time).
Device trees needs to be written to match the hardware, compiled and supplied to the kernel. So not really as nice as "boot this kernel on any hardware".
Device trees are needed because manufacturers did not create self describing hardware.
> Device trees are needed because manufacturers did not create self describing hardware.
This is impossible to do beyond trivial components. Most devices are complex systems of interacting components.
Also, do you want the same lazy manufacturer that couldn't bother to create a device tree create a complete hardware description ROM and get it right in the first attempt?
In practice "self describing" doesn't mean the hardware completely describes everything about itself. It only needs to describe enough that the OS can load the right driver and the driver can locate the hardware address, interrupts and so on. After that the complexity resides in the driver itself. PCI has been doing this sort of thing successfully for two decades, so we know it's possible. ACPI has been doing the same thing for power management, clocks, power zones, suspend etc, again for something like two decades.
In theory, maybe. But have you seen the corresponding code in the linux kernel when they try to probe a partially known component and at the same time go around all kinds of bugs and errors and differences in silicon revisions? All those magic numbers and timeouts and dependencies that no one really can explain?
Configurations pages are a thing of the past, nowadays they should at best be used for confirmation and sanity checks. Device trees are the future and have already improved and simplified hardware management a lot (specially on ARM).
It would probably force the hw manufacturer to think through its design a little bit more, which would be a good thing. I've seen enough of these SoCs with "complex system of interacting components" to feel that a well thought out design that needs less static description of SoC/board/cpu level details would be beneficial.
You can bit bang a lot with 320Mhz cycle time :-). That said I wonder if we'll see a common standard emerge for an 'open' serial protocol. Sort of PCIe lite which runs at say 250Mhz over a pair of LVDS pins that you make into an AHB bus master in the memory matrix. That would really open up the peripheral market.
The frequency limit for bit banging is usually limited by the GPIO pins (well, their drivers) rather than the core clock. I use some ~250MHz PIC32MZ's with their GPIOs capping out at around 50MHz, and even the raspberry pi with a core clock of ~1GHz can only bit bang with its GPIOs up to ~60MHz without running into problems. As for LVDS, it takes up more area and power, and you usually need to license the IP block rather than it being available for free from the standard cell library provider.
Is there really a need? Most serial protocols can really be considered 'open'.
For low bandwidth there is RS232/UART/JTAG. Then there is I²C/SPI for networked. I²C ranges from 100kbit/s to 2.3Mbit/s. And SPI goes to 10Mbps/30Mbps.
For higher bandwidth stuff there is whatever you want over USB, Bluetooth, 802.11BLAH Wifi. Etc... Afaik USB can be considered 'open', you might need to 'register' for a certification/vendor id/logo but from what I understand if you don't want those you don't need to bother. There are also http://pid.codes/ and other organisations that are giving away free pids.
My experience is mainly from 8-bit CPUs, but unless you can afford one IRQ per bit, won't you just be busy waiting at 320 MHz instead? Performance tends to be less of a problem than the fact that you can't do anything else at the same time.
The ESP8266 and ESP32 have a very similar memory mapped quad-SPI setup. The (older, cheaper, more limited) ESP8266 is probably the most similar in terms of peripheral set etc, though that obviously still has more RAM, WiFi, I2C and I2S which this lacks.
Regarding point 1, this is a terrible design decision. Any low power or high performance uC that requires code in external SPI loads it into and executes from RAM. At 32Mhz it starts to make sense to have an ICACHE, with EMBEDDED Flash on a parallel bus. At 320Mhz you want to cache your RAM. With 320Mhz and external SPI (not even fast parallel NOR), you have to be insane. A single instruction cache miss will cost you 100s of clock cycles for a SPI read.
I really hope they just forgot to mention 100kB+ of RAM on that landing page, and the 16kB data is just a DCACHE.
The limitation that I referred to (same as open-v) is that it's very difficult to get IP for embedded NVRAM, and requires more complicated processes. So this seems like the next best option. Yes, misses are going to be very expensive, but I'd rather have icache than manually paging code (plus you can use the 16KiB internal RAM for tight timing loops). More RAM would of course be better, but probably not the first thing I would add in the next revision (that would be more peripherals).
Also note that Quad SPI flash is 4 bits wide, and is NOR flash.
Yes, the process trade offs are not so good if you want embedded flash. You need a process that can withstand the high voltages for flash erase, which means thick oxides and slow transistors.
At 320Mhz, you should have icache and paged code. A process that can clock this high should have no problem with RAM densities to offer more than 16kB. Better to mirror what the entire industry does: low clock speed and embedded flash, or better process with faster clocks, more RAM and external flash.
Quad SPI NOR is still SPI, which requires serial timings. You need to serialize address and data with each transaction. Parallel NOR has parallel address and data, and an order of magnitude improved throughput and latency.
There is really no point to 320Mhz with such slow code memory.
Yeah, even with the Opus application I mentioned, 320Mhz is way overkill - 50Mhz would have been plenty. It kind of seems like for some reason they were able to get a modern process, but only a really tiny die. It's pretty interesting that Open-V made the same tradeoff - 160MHz, but only 8KiB of RAM shared between code and data (with no memory mapped SPI flash).
Yeah, bit banging I2S would be gross. You could, however, use the PWM outputs for audio as long as the requirements aren't too strict. And of course, you could always implement it with extra hardware. Hoping for I2S on the next version :)
This is the first time I've come across this concept (outside the ESP ecosystem); just to check I'm understanding this correctly, this means I can hook up a serial SRAM or equivalent device and the hardware will automatically demand-page arbitrary amounts of code out of it, right?
What's the throughput like? Can this be used for data as well (which will need caching too)?
Because this suddenly makes the device much more interesting; for everything I've done with microcontrollers (which, I'll admit, tends to be abusive), I would happily trade performance for some more RAM.
No, the QSPI device is simply memory mapped as in quite a lot (most) microcontrollers.
The external QSPI FLASH just appears in the normal memory map, no demand-paging.
Normally used for eXecute-In-Place code (XIP) when the code is too large for the internal FLASH.
There is a performance trade-off, and external QSPI FLASH might run its bus at (for example) 50MHz, but thats still much slower than internal FLASH directly attached to the bus.
But SPI devices can't be memory mapped --- it's a serial protocol and they can't be attached to the bus. For this to work, something must be converting bus accesses into SPI requests. And then, at least assuming it's not doing an SPI read for every access, it needs to cache the result somewhere... but you've just said it's not doing demand paging?
There is a hardware block within the processor that converts memory-bus accesses into SPI accesses (for devices supporting the SPIFI standard for example). This makes the whole SPI device appear to be memory-mapped (for read accesses).
It's got a standard two-way associative cache in front of the SPI interface, much like you might have between RAM and the external bus in a normal CPU. If availabkle, demand paging would be the next step after the data wasn't found there either - a slower, higher-level concept that's handled in software rather than hardware.
1. It doesn't have any onboard NVRAM (same limitation as the open-v). However, it does have a directly memory mapped quad-SPI peripheral and icache, which is a great alternative and might be better for applications that require a large amount of data. Note that an icache would still be required even if it had onboard NVRAM, because of its speed. You could also make swappable game cartridges, for example.
2. It has enough RAM and speed to run Opus.
3. The rest of the peripheral set is pretty barebones, no analog like the open-v has. No I2C or I2S without bitbanging, either.
4. The boot ROM has peripheral information stored in it. You might be able to have one binary that boots on different types of cores using this.
[1] https://dev.sifive.com/documentation/freedom-e300-platform-b...