PIO is indeed awesome! A while ago I was wondering if the RP2040 could be used as a scandoubler (and talking about it with the Chrissy who left a comment on your blog). I was originally discouraged and decided to not pursue it because the ADCs on the RP2040 are just too slow to capture video, so I'm happy to see the results of your effort!
As soon as I read about the programmable I/O setup, it reminded me of the Propeller series chips from Parallax. The "cogs" that are the main processors of those chips aren't as general use as something like the STM32 cores, but they do certain things very, very well.
The P2 is a huge step up from the old propeller. If you haven't checked it out, I recommend it. "smart pins" are able to do all kinds of things, almost like having mini cores along with the main cogs. I used it for a project and was able to achieve all my goals. Really happy with it.
You've got me intrigued,but also wanting more specifics. Did you share any details of this anywhere you can link to? (don't worry if not, I'm just nosy)
https://www.parallax.com/propeller-2/ tons of improvements in terms of memory access etc, but pay attention to the "64 smart pins" section specifically. I use them to buffer out sub-microsecond waveforms 3ms at a time. I send it off to the smart pins then my cog is freed up for other things. I also previously used some for an HDMI GUI.
If AMD incorporates scaled down Xilinx's FPGAs into their x86-family product line, that could bring a lot of RasperryPi's community effort into a mainstream products too (home PC) and let us experiment embedded software directly on our PC! ...and break our main PC during our experiments too, oopsie. But it would be worth it haha.
Always keep in mind that one of the harshest limits on PC design is the number of pins on the CPU.
I really doubt we will see GPIO pins available directly from the CPU, and if they don't come directly from there, there isn't much difference from using a PCIe or USB adapter.
Or, in other words, what you want can already be done about as well as it will ever get. The hype for adding FPGAs into PCs is for using them as co-processors, completely inaccessible for any other hardware.
AM4 has 12 dedicated GPIO pins and ~30 pins where GPIO is shared with other functions. On the other hand these are mostly meant for platform control, and maybe blinking LEDs, not for user-space bitbanging something there.
Also of note is that both Ryzen IODs and Intel PCHs contain what could be called "half of an ESP32" connected to few IO pins under the name of on-board HD audio.
What you're asking for now exists, Zen 4 mobile SKUs allegedly ship a Xilinx design on the die for "AI Acceleration" (some of their Versal fabric over some weird bus), that has absolutely 0 external software consumers beyond some vaporware about video effect software for Windows 11 e.g. background image removal and background noise removal. They really just aren't very easy to program or use externally, and require lots of integration work, and that remains a major limiting factor in practice. The pure silicon-area overhead is also pretty severe compared to a fixed ASIC (think ~50-100x worse), limiting their practical size.
There are other considerations; large FPGAs are kind of slow to program and have limited or fixed support for multi-tenancy, for example, you have to carve up the device into fixed units ahead of time and divvy those out, and unused resources cannot be re-used. It seems like "time-multiplexed" FPGAs, such as what Tabula was trying to accomplish before going bankrupt, might be better suited for that, which has other tradeoffs. I do wish you could get something high-speed, attached to a desktop class processor.
Fun peripherals aren't really the reason for the RPi's large community, anyway. That result is mostly a mix of software support, pricing, and being in the right place at the right time.
"compared to a fixed ASIC" seems like a bit of a harsh comparison.
The ideal fixed ASIC is as die-facilities-efficient a solution to a particular problem as you're going to get. The ideal FPGA is as generalised a solution to a large bucket of problems as you can get. Do they have to compete?
Ease of programmability though, there I agree and more. A chip facility can't be exciting or even interesting if it's hidden behind being a giant pain in the backside to drive.
(disclaimer: I used to be really interested in this stuff, but the problems I was interested in were eaten up by general processors and simple uses of GPUs and I'm just not interesting enough to have problems that really justify exciting hardware any more... more power to you if you still do)
> Fun peripherals aren't really the reason for the RPi's large community
What many might not realize is that the RP2040 got a massive boost due to supply chain issues affecting the STM32 line. We had to choice but to redesign a board to adopt the RP2040 when STM32's were being quoted at 50+ week lead times. It was a black swan event like no other.
We would have never touched the RP2040 without such an overwhelming forcing function in place. The chip has serious shortcomings (example: no security) and the company could not give one shit about the needs of professional product developers. Just asking for proper support under Windows was a nightmare.
Not sure if things have changed, at the time they seem to have no understanding of how real products are developed, tested, qualified, certified, evolved and supported over time.
It's one thing to make little boards for educational markets. It's quite another to build embedded systems that are part of complex multidisciplinary products non-trivial service lifetime and support.
We dropped the RP2040 like a hot potato as soon as STM32's became available.
Making the decision to redesign the boards was a no-brainer. On the one side you are dealing with a company that makes educational boards that have the luxury of appealing to an audience that shrugs off such things as reliability, tools and manufacturing process integration. On the other side (STM), you have the support of an organization and an ecosystem that has been dedicated to meeting the needs of professional product developers for decades. The difference, from my side of the fence, is impossible to miss. Black swan events sometimes make you do things you will live to regret. For me, this was one of them.
BTW, I do like aspects of this chip. Someone should take it and run with it in a professional manner. Raspberry Pi Ltd. isn't that company. It wasn't until an engineer from India did the hard work to attempt to create a better experience under Windows that the company "released" a solution. This "solution" resorts to such things as reinstalling VSCode. Brilliant.
Coarse Grained Reconfigurable Arrays (CGRAs) are the only way I see accelerators taking off. They reconfigure a lot faster and I believe they have better area utilization at the expense of bit-level programmability. I don't see many use cases for the FPGAs bit-level reconfiguration in an accelerator anyway so I doubt it would be missed.
While I love the idea of FPGA co-processors on CPUs, I'm wondering how useful they could be. I guess you could replace the video transcoding unit so you're not tied to one codec but how often do those change anyway.
I must admit I did not think this whole thing fully.
But what was interesting to me was the fact that you could add a peripheral that was not initially intended by the manufacturers, making a mainstream motherboard more versatile.
For you it may be a video transcoding unit, for someone else it may be an SPI or I2C device, PCIe, or extra ethernet, or high quality audio.
I'm not sure what peripherals were implemented by the community for the RP2040 either, maybe they would not make sense on a PC.
Maybe PCIe already does something similar, that's not something I have knowledge about.
Though there is a small difference in my opinion, where, from the point of view of the CPU, it should behave as a normal interface, thus the driver should already exist, and only require a change in the device tree (for linux).
It would still require quite a bit of work:
- The PIO has to behave bug-for-bug compatible with an existing driver
- The exposed pins need have the proper voltage levels & electrical protection
there are a few FPGA dev kits which are set up this way. I have one which has a dual core Atom CPU and a large FPGA connected via PCIe, and the speed is fast.
In that case the CPU is basically the co-processor for the FPGA. I've yet to see a use that wasn't primarily using the FPGA because it needed to be an FPGA, they're not great if you just want to run something fast (outside of a few small uses).
I stumbled upon this project a few weeks ago, while starting to get a better handle on how to use PIO (and the rest of the RP2040 infrastructure, like PIO/DMA interactions). I searched for it at first, thinking that an HDL implementation of PIO would be a good way to test and simulate PIO interactions and timings. It turns out there are some neat and more specific PIO simulator/emulator projects out there, so I've moved that way.
Still, this project is quite interesting. I have a design I'm working on that would really benefit from 4 PIO blocks all cooperating on a piece of RP silicon. Since that's not likely to ship anytime soon, I am vaguely interested in piecing together some HDL that stitches together a RISC-V core with several PIOs.
Not sure I understand what you are saying. If you have an FPGA you can do anything you want, better and faster than the RP PIO hardware. Why replicate it?
If you are talking about creating microcoded state machines in an FPGA, once again, this is the kind of thing that is almost trivial. Examples of subsystems can can be implemented this way are dynamic RAM controllers, sophisticated programmable image processors/scalers, etc.
I considered addressing this point in my initial post, but decided not to open the can of worms. You're right, if I have an FPGA I can do anything I want. But what I want to do is take existing RP2040 PIO programs, and write RP2040 SDK C code to run them and interact with them, but I want to drive more pins and manage more state than the RP2040 offers. If I have a design that takes two RP2040s today and some very careful communication between them, I'd love to try and run it on FPGA but not throw out all the work.
Not sure how that might work. To run compiled RP2040 SDK C PIO code on a PIO "clone" inside an FPGA you are going to have to do a ton of work. That's kind of where I was going with my question. By the time you do all that work to use an FPGA you might as well implement the functionality in hardware.
I love FPGA's...and yet I hate them with a passion. When they are the right fit for the job you can't beat them. However, this comes at a pretty serious cost in terms of debug cycles per day, troubleshooting complexity and just plain time. I am known for saying that developing a solution on an FPGA takes "cubic time" when compared to other options.
I have worked on a number of projects where compile times are in excess of an hour and debugging is excruciatingly painful. You need an 18 hour day just to iterate through two to four attempts at fixing a problem. Like I said, painful.
These days I avoid them as much as possible or, when necessary, I prefer to license well-designed and well-tested cores that just work.
One thing that's nice is that if you have this, you can use an FPGA as a devkit, figure out all the bits and pieces that you need, then use the RP2040 in a "final" product.
FPGAs are obviously cool and powerful, but there's a pretty big cost differential.
On top of that, an FPGA is not a replacement for a CPU in itself. Soft cores are costly in gates. Of course "FPGA + CPU" is a thing, but in that case... if you can get away with a CPU by itself your tooling gets a lot simpler.
The Pi engineers claim that at least some aspects of PIO are "patent-pending," [0] so a 1:1 reimplementation in another MCU would probably infringe on some claim or another eventually.
Yes, software defined I/O is not uncommon which is one reason I'm quite curious about the specific claims in the PIO patents. Another good example is the Infineon Peripheral Control Processor (PCP) used in Tricore. It's much more powerful ISA wise than PIO, but conceptually extremely similar.
I came here to inquire about this. The relevant patents haven't been published yet? I wouldn't include this in a commercial design due to potential infringement.
You can capture video in a weird format with PIO on one core, and output it with PIO (in a standard format like VGA, or even DVI) on the other core, like here: https://blog.qiqitori.com/2022/09/raspberry-pi-pico-15-6-khz...
Or you can implement old DACs that expect a weird input data format, to a certain extent, like here: https://blog.qiqitori.com/2023/03/raspberry-pi-pico-implemen...
(Now I'm almost at the end of my sabbatical but think these projects (and others) were totally worth doing even if it meant living off savings, heh.)