So, this is my project! Was somewhat hoping to wait until there was a bit more content up on the site before it started doing the rounds, but here we are! :)
To answer what seems to be the most common question I get asked about this, I am intending on open-sourcing the entire stack (PCB schematic/layout, all the HDL, Windows WDDM drivers, API runtime drivers, and Quake ported to use the API) at some point, but there are a number of legal issues that need to be cleared (with respect to my job) and I need to decide the rest of the particulars (license, etc.) - this stuff is not what I do for a living, but it's tangentially-related enough that I need to cover my ass.
The first commit for this project was on August 22, 2021. It's been a bit over two and a half years I've been working on this, and while I didn't write anything up during that process, there are a fair number of videos in my YouTube FuryGpu playlist (https://www.youtube.com/playlist?list=PL4FPA1MeZF440A9CFfMJ7...) that can kind of give you an idea of how things progressed.
The next set of blog posts that are in the works concern the PCIe interface. It'll probably be a multi-part series starting at the PCB schematic/layout and moving through the FPGA design and ending with the Windows drivers. No timeline on when that'll be done, though. After having written just that post on how the Texture Units work, I've got even more respect for those that can write up technical stuff like that with any sort of timing consistency.
I'll answer the remaining questions in the threads where they were asked.
I have seen semi-regular updates from you on discord and it is awesome to see how far this project has come (and also a bit frustrating to see how relatively little progress I have made on my FPGA projects in the same time!). I was hoping you'd do a writeup, can't wait!
Googling the Xilinx Zynq UltraScale+ it seems kinda expensive.
Of course plenty of hobbies let people spend thousands (or more) so there's nothing wrong with that if you've got the money. But is it the end target for your project? Or do you have ambitions to go beyond that?
Let's be clear here, this is a toy. Beyond being a fun project to work on that could maybe get my foot in the door were I ever to decide to change careers and move into hardware design, this is not going to change the GPU landscape or compete with any of the commercial players. What it might do is pave the way for others to do interesting things in this space. A board with all of the video hardware that you can plug into a computer with all the infrastructure available to play around with accelerating graphics could be a fun, if extremely niche, product. That would also require a *significant* time and money investment from me, and that's not something I necessarily want to deal with. When this is eventually open-sourced, those who really are interested could make their own boards.
One thing to note that is that while the US+ line is generally quite expensive (the higher end parts sit in the five-figures range for a one-off purchase! No one actually buying these is paying that price, but still!), the Kria SOMs are quite cheap in comparison. They've got a reasonably-powerful Zynq US+ for about $400, or just $350ish the dev boards (which do not expose some of the high-speed interfaces like PCIe). I'm starting to sound like a Xilinx shill given how many times I've re-stated this, but for anyone serious about getting into this kind of thing, those devboards are an amazing deal.
Yeah you're referring to the Linux kernel but software is much cheaper to design, test, build, scale and turn profitable than HW, especially GPUs.
Open source GPUs won't threat Nvidia/AMD/Intel anytime soon or ever. They're way too far ahead in the game and also backed by patents if any new player were to become a thereat.
I've been told by several people that distributor pricing for FPGAs is ridiculously inflated compared to what direct customers pay, and considering that one can apparently get a dev board on AliExpress for about $110 [1] while Digikey lists the FPGA alone for about $1880 [2], I believe it (this example isn't an UltraScale chip, but it is significantly bigger than the usual low-end Zynq 7000 boards sold to undergrads and tinkerers).
This is both true and false. While I work with Intel/Altera, Xilinx is basically the same.
That devboard is using recycled chips 100 percent. Their cost is almost nothing.
The kintex-7 part in question can probably be bought in volume quantities for around $190. Think 100kEAU.
This kind of price break comes with volume and is common with many other kinds of silicon besides FPGAs. Some product lines have more pricing pressure than others. For example, very popular MCUs may not get as wide of a price break. Some manufacturers price more fairly to distributors, some allow very large discounts.
I have some first- and second-hand experience with this, and you are correct. I'm not sure who benefits from this practice. It's anywhere from 5-25x cheaper in even small-ish quantities.
I'm personally too far from those negotiations to offer any likely-pivotal insight (such as a concrete quantity), but my very rough understanding is that there's some critical volume beyond which a customer basically becomes "made" with the Xilinx/Altera sales channels via a financially significant design win, at which point sales engineers etc. all but have a blank check to do things like comp development boards, advance a tray of whatever device is relevant to the design, etc..
Basically, as George Carlin put it, "it's a big club, and you ain't in it".
I don't have exact numbers, but I'm pretty sure you can get significant discounts starting around 100 parts. So not much at all.
Another thing to note is you can already get parts for significant discounts in 1-off quantities through legit Chinese distributors like LCSC. For example, a XC7A35T-2FGG484I is 90$ on Digikey and 20$ at LCSC. I think a personalized deal for that part would be cheaper than 20$ though...
How much it depends on hard IP blocks ? I mean, can it be ported to FPGAs of other vendors, like Lattice ECP5 ? Did you implement PCIe in HDL or used vendor specific IP block ? Please, provide some resource utilization statistics. Thanks.
The GPU uses https://github.com/alexforencich/verilog-pcie + the Xilinx PCIe hard IP core. When using the device-independent DMA engine, that library supports both Xilinx and Intel FPGAs.
Implementing PCIe in the fabric without using the hard IP would be foolish, and definitely not the kind of thing I'd enjoy spending my time on! The design makes extensive use of the DSP48E2 and various BRAM/URAM blocks available in the fabric. I don't have exact numbers off the top of my head, but roughly it's ~500 DSP units (primarily for multiplication), ~70k LUTs, ~135k FFs, and ~90 BRAMs. Porting it to a different device would be a pretty significant undertaking, but would not be impossible. Many of the DSP resources are inferred, but there is a lot of timing stuff that depends on the DSP48E2's behavior - multiple register stages following the multiplies, the inputs are sized appropriately for those specific DSP capabilities, etc.
In the post about the texture unit, that ROM table for mip level address offsets seems to use quite a bit of space. Have you considered making the mip base addresses a part of the texture spec instead?
The problem with doing that is it would require significantly more space in that spec. At a minimum, one offset for each possible mip level. That data needs to be moved around the GPU internally quite a bit, crossing clock domains and everything else, and would require a ton of extra registers to keep track of. Putting it in a ROM is basically free - a pair of BRAM versus a ton of registers (and the associated timing considerations), the BRAM wins almost every time.
this is very awesome! Could you recommend me some books if I want to do something similar? for example, on how to design a pcb board for a PCIE pluggable hardware.
To answer what seems to be the most common question I get asked about this, I am intending on open-sourcing the entire stack (PCB schematic/layout, all the HDL, Windows WDDM drivers, API runtime drivers, and Quake ported to use the API) at some point, but there are a number of legal issues that need to be cleared (with respect to my job) and I need to decide the rest of the particulars (license, etc.) - this stuff is not what I do for a living, but it's tangentially-related enough that I need to cover my ass.
The first commit for this project was on August 22, 2021. It's been a bit over two and a half years I've been working on this, and while I didn't write anything up during that process, there are a fair number of videos in my YouTube FuryGpu playlist (https://www.youtube.com/playlist?list=PL4FPA1MeZF440A9CFfMJ7...) that can kind of give you an idea of how things progressed.
The next set of blog posts that are in the works concern the PCIe interface. It'll probably be a multi-part series starting at the PCB schematic/layout and moving through the FPGA design and ending with the Windows drivers. No timeline on when that'll be done, though. After having written just that post on how the Texture Units work, I've got even more respect for those that can write up technical stuff like that with any sort of timing consistency.
I'll answer the remaining questions in the threads where they were asked.
Thanks for the interest!