In addition to the ZipCPU itself, the zipcpu.com website has many useful blog posts for the beginning FPGA developer. I've found it to be a great resource. Mr Gisselquist (a.k.a. ZipCPU) is an active contributor to a number of FPGA forums and, in my experience, is always helpful to newcomers.
I'm curious, is there any commercial application for running Linux on a CPU implemented in FPGA, as opposed to any (presumably cheaper and faster) normal CPU? Some kind of custom modification this would allow you to make?
I've always thought of FPGA's as being for custom parallel computationally intensive tasks that run side-by-side with a normal CPU. So just curious if ZipCPU is just for fun, or something more?
(Or is there even something security-related about being able to verify not just the code, but the CPU it runs on, in this case?)
Sure, it happens all the time if you want a full os on a core that's tightly integrated with some custom hardware, but you don't want an ASIC (for reconfigurability, or just low volumes).
I'm not a hardware engineer, but my understanding is that some FPGAs include an entire cpu core as part of the chip? This seems much more efficient than designing your own.
Super small CPU cores are fantastic for complex FSM replacement, especially on an FPGA which has unused block RAMs (so using them is essentially free.)
I’ve seen cases with relatively complex pure HW FSM being replace by a small CPU and 1KB of RAM where the logic used by the CPU ends up smaller than the FSM.
But now the CPU is programmable, so you can iterate much quicker in case of bugs, without the need to reaunthesize.
A good example are SDRAM controllers: almost all of them have a small CPU inside the controller that is used for calibration training.
It's almost always more efficient to use an existing CPU if it fits your use case than to design your own. Even if it doesn't, it is often more efficient (time wise) to adapt your use case to the hardware you have available, or write software to resolve the feature you want in hardware but don't have. Only the especially nutty build their own CPUs (I'm especially nutty too)
The cheapest FPGAs are under $10 and many can fit a small softcore - usually not a full pipelined, cached + MMU RISCV core at 100+ mhz but enough to drastically simplify high level logic in FPGA projects. The Zynq series with onboard CPUs start at $40-50 in low volumes and have 1-4 cores at 600 mhz which is complete overkill for many FPGA projects.
Slow, $2000-a-unit hardware might be worth it if it makes your OS close to unhackable. I'd just have to make sure apps are lean and efficient. I already favor those, though. :)
Also, there's been work in "instruction set randomization" that obfuscates the CPU itself to reduce odds malware will work. Mostly widely-deployed, non-custom attacks but might stop some custom attackers, too.
Older work was useful for making RAM and other things either untrusted or less trusted with crypto setups. Joshua Edmison did one of those with a nice survey of others:
Finally, Meltdown/Spectre created lots of designs recently to plug timing channels. Too many to even track. Here's the first one I found from my submissions:
Considering the implications of a widely available unhackable computer on real security, crime and politics, and that many powerful people/organizations are working to prevent distribution of this technology, do you really think it would become available ? how ?
Computers that are ridiculously hard to hack and methods for building them have been available for some time. NSA used to evaluate them for years on end trying to hack them. If they passed the test, the got the TCSEC A1-class label for verified[-as-possible] security. SCOMP/STOP, GEMSOS, Boeing SNS Server, LOCK, and KeyKOS w/ KeySAFE are examples of systems designed with such techniques or certified. I think STOP-based XTS-400 and Boeing SNS Servers have been in deployment for decades now without any reported hacks. They were on the munitions list for export restriction (idk if enforced), some companies sell high-security to defense only, and they all cost a ton of money if proprietary.
Under newer standards and techniques, examples include INTEGRITY-178B, Perseus Security Framework (eg Turaya Desktop), seL4, Muen hypervisor, and ProvenCore. Two of those are open source with one having pieces that were released. On hardware side, the groups developing it often openly describe the stuff where anyone with money can implement it. Some openly release it like CHERI CPU w/ FreeBSD. Some commercial products making every instruction safe like CoreGuard.
So, it's not a question of whether people will develop and sell this stuff. They've been developing this stuff since computer security was invented back in the 1970's. It was sometimes the inventors of INFOSEC developing and evangelizing it. Businesses didn't buy the stuff for a variety of reasons that had nothing to do with its security. Sometimes, esp with companies like Apple or Google, they could clearly afford the hardware or software, it was usable for some to all of their use case, and they just didn't build or buy it for arbitrary reasons of management. Most stuff they do in-house is worse than published, non-patented designs which is just more ridiculous.
DARPA, NSF, and other US government agencies continue to fund the majority of high-security/reliability tech that gets produced commercially and/or for FOSS. These are different groups than the SIGINT people (i.e. BULLRUN) that want to hack everything. Also, they might be putting one backdoor in the closed ones for themselves while otherwise leaving it ultra-secure against everyone else. That's what I've always figured. Lots of it is OSS/FOSS, though, so that's easier to look at.
Infrant used to sell a NAS product that ran Linux on a Sparc implemented in FPGA. No idea why, but it was a commercial success. They moved to ARM or x86 around the time Netgear acquired the company.
The project was started in Sept 2016 which was also the same month that I was first porting Fedora to RISC-V. Given the momentum behind RISC-V and also the huge amount of software now available it would obviously make sense to use a RISC-V compatible design. However hobbyists can do what they want, there's no reason to force someone who wants to design their own CPU to do it in any particular way.
The specification actually mentions RISC-V, and why I didn't use it. In hind sight, it's harder to decode the RISC-V instruction set than the one for the ZipCPU--to many holes in strange places.
Why are you being obtuse, rather than trying to be clear?
RISC-V is the major Free and Open CPU. RISC-V can run on FPGA. It's natural to want a comparison of this project against RISC-V, especially considering the write-up mentions the less successful OpenRISC project.
This page appears to have been written in summer of 2016. Back then RISC-V didn't even have a stable privileged instruction set yet, so there wasn't really a RISC-V to compare it to in a meaningful sense.
PicoRV32[0] targets some extremely low spec FPGAs. I have it running on ridiculously small and cheap FPGA[1] that has 8k LUTs and cost me £41 including tax and next day delivery.
However I repeat what I said above: hobbyists are free to design their own instruction sets, and good on them!
The ZipCPU and several peripherals can fit within an iCE40 8k using only about 5k LUTs--and that's with the multiply unit, the divide unit, the pipelined fetch unit, and a debugging bus infrastructure. While that isn't the full pipelined implementation neither do the caches fit, it is sufficient to be competitive with the picoRV.
The picorv32 is a nice little processor, but it's surprisingly inefficient in its use of FPGA resources compared to the performance it offers (which is very low.) For small FPGAs, it's far from being the best solution.
That said: there are a bunch of SOC construction tools for it, so that's definitely a redeeming factor.
But then again nearly every CPU runs on FPGA these days. 6502, Z80, various mainframe CPUs, x86, and more modern CPUs. Modern CPU development often goes via FPGA - after all, the CPU development tools are the same: Verilog, for example. Which you first synthesize to FPGA, and, after some testing rounds, to silicon.
You may be able to run certain modern CPU units on an FPGA, maybe even one modern core, but compare the 4 million LUTs the biggest FPGA has to the 19 billion transitors a 32 core AMD EPYC has. With a LUT wildly variable equivalence of 1 to 1-50 ASIC transistors depending on the logic implemented, there's still one order of magnitude gap between them at best.
In practice, Verilog simulators are used instead, they're slow as a dead horse on smack, but it's the only way to have full (logic) system integration of such big designs.
There's a halfway point, confusingly called emulators. You get a whole rack of FPGAs (or sometimes giant arrays of tiny little CPUs that only run boolean ops) connected to eachother, running at 10s of Mhz with the signals between them multiplexed.
> we verify a fairly realistic implementation of a multicore shared-memory system with two types of components: memory system and processor. Both components include nontrivial optimizations, with the memory system employing an arbitrary hierarchy of cache nodes that communicate with each other concurrently, and with the processor doing speculative execution of many concurrent read operations. Nonetheless, we prove that the combined system implements sequential consistency. To our knowledge, our memory-system proof is the first machine verification of a cache-coherence protocol parameterized over an arbitrary cache hierarchy, and our full-system proof is the first machine verification of sequential consistency for a multicore hardware design that includes caches and speculative processors.
That paper doesn't have anything to do with what the original poster said, which is that extremely large scale VLSI-style designs don't use FPGAs, but instead use large-scale, extremely expensive and fast simulators instead. That's basically correct, regardless of the kinds of formal methods that are in play: simulation is vastly cheaper and more feasible at large scales, so much so that people instead buy hardware accelerators for simulation to improve development time and get reasonable turnaround on complex components.
Smaller-scale embedded systems or bespoke designs can get away with full FPGA flows in order to do low-scale deployment/development (e.g. SiFive's CPU cores all run this way, and you can run them) -- and pretty much all hobbyist CPUs do so. But to say all modern CPUs use FPGAs in their development cycle is overstating it a bit. There's simply no devices that exist the kind of logic density needed to realistically model many large scale, high-volume designs you see today, not to mention the unbelievable synthesis times you'd have to endure.
> The scope of verifiable components is fundamentally limited by the state-space-explosion problem. For example, in order to verify cache-coherence protocols with Murphi [Dill et al.1992], a widely used model checker designed for that exact task, at one point the biggest system that could be verified had only 3 cores interacting with a single-address memory having two potential values. While improvements to algorithms and hardware (for running the analysis) continually increase the feasible system size, there are fundamental limitations of such tools that explore finite state spaces, in the face of exponential growth in state spaces as we add processor cores or memory addresses.
> In this paper, we introduce the Kami framework for the Coq proof assistant, which brings this style of development and verification to the world of computer architecture, simultaneously reversing all of the weaknesses enumerated above for most hardware-verification tools. Kami makes it possible to code and verify a fairly realistic processor within Coq and then extract it to run on FPGAs (and almost certainly in silicon, though we have not yet had the opportunity to fabricate a “real” chip). We also emphasize that our use of “verified” is more in the tradition of CompCert [Leroy 2006] or seL4 [Klein et al. 2009] than of the day-to-day practices of verification engineers in the hardware industry: we prove that a system with any number of shared-memory cores with associated coherent caches correctly implements the functional-correctness specification of our instruction set, with respect to sequential consistency.