Hacker News new | past | comments | ask | show | jobs | submit login
Writing an open source GPU driver without the hardware (collabora.com)
446 points by mfilion on Jan 27, 2022 | hide | past | favorite | 90 comments



I've got a CS degree and of all the things that I studied, including OS design, nothing gives me that feeling of black magic like Kernel hacking and driver building. It's like being told your gonna learn how to draw an owl you start with some ovals and then you have a completed owl.

Low level stuff has always been interesting to me, I just have no clue how one would even get started with it, or who would even pay me to work on it.


I also have a CS degree and felt the same black magic during my studies, but then I took some microcontroller classes, and they were eye-opening. In the sense that all of the CS operating system stuff suddenly made so much more sense.

For example, take something like (1) the simplest AVR microcontroller (the type used in Arduinos, hence there is a ton of third-party documentation and code floating around) and (2) your typical Hitachi HD44780 LED display.

To control the display, one needs to pull certain pins high and low, for specific intervals -- this is all in the HD44780's data sheet. Of course, you don't do this every time you send a character; you abstract this away into functions setchar(x, y, c), clearDisplay(), and so on.

Presto, your first driver!

Then you might think: let's print stuff that we receive via a serial connection. So how do you react to that data? Then you learn about interrupts and interrupt handling.

Then you might think: I could be doing stuff while I wait for I/O. Actually, I'd like to do a lot of stuff, in parallel, but the AVRs are single-core (so to speak). So you do what you've heard other OSes do: take available compute time, divide it into slices, and distribute slices to processes. And thus, you have learned about multi-tasking.

This was relatively simple to do on AVRs, and great fun.


Yeah, people don't realize that the operating systems we have today arise from devices about as powerful as some of these microcomputers. Removing the complexities that have been added to the x86 platform itself and being able to see "oh when this pin goes high cause I pressed that momentary button the CPU runs the code at this memory address" things can really start clicking. A great youtube creator I recommend for those types of explanations (he's making his own CPU out of base components) is Ben Eater and specifically his 6502 series, but the series on how USB keyboards work is also quite instructive for people looking at the hardware at that level of detail.


+1 to Ben Eater. His breadboard computer series, when I first watch it a few years ago, is when computers as a whole finally started to click for me, and not just the high-level programming languages.


Ben Eater is a legend - if not for his amazing designs, then at least for those beautifully wired breadboards!


The easiest way to get started is to just write a driver for whatever simple device you have.

It may be easier if you do some electronics as a hobby since you can create the devices yourself with simple enough protocols. It gives you a good understanding of how the hardware works at a basic level.

There are thousands of tutorials out there on how to get started writing simple drivers, as well as the excellent book "Linux device drivers".

I don't do _complex_ drivers, but I've been able to write simple ones for my DIY electronic devices (keyboards, sensors, ...) and even a reverse engineered one for my BOSS RC5 guitar pedal.


I have written some audio drivers for my own hardware and I still feel lost every time I start with a driver. Admittedly less so than the first time, but if there were decent up-to-date documentation it would really help a lot. (Perhaps my problem is more specific to ALSA documentation)


Start by tracing exist open source Intel/AMD driver in linux:

   Can be for old laptop (T420, etc with Intel GPU/Driver.) 
   Use FTRACE in kernel. 
   Setup ebpf trace for GPU acitivies.
   Write some doc/blog/medium pages on the process and show off your works.
   Understand, document and improve some opensource GPU API related utilities
   Understand and document interaction between GPU/GUI App and OpenSource driver.


In term of jobs: AMD has 289 opening for intern positions: https://jobs.amd.com/go/Internships-&-Co-op-Opportunities/25... A lot of them are graphic related.


IMO the best way to get into this type of low-level tinkering is by writing a simple operating system.

https://github.com/isometimes/rpi4-osdev

There are other courses/projects for other boards. The keyword is usually “baremetal”.

For Linux drivers specifically there are training material from Bootlin etc. They definitely give you a kickstart—the kernel is so complex that most of the device driver knowledge is organizational and cannot be found on YouTube.


> Low level stuff has always been interesting to me, I just have no clue how one would even get started with it, or who would even pay me to work on it.

One keyword is embedded programming. You can start out bare metal basically doing everything in a while(1) loop until you get to a point where different tasks (in the literal way) and peripherals get so much that you will end up doing exactly this, but on a much smaller (and comprehensible) scale.


You setup a table of functions that the HW invokes when things happen. These functions add some sort of event onto a queue. At some point in the future, the kernel processes these events. Rinse, repeat.


That, and MMIO. Lots and lots of mapping memory ranges/addresses to peripherals.


Try running NetBSD. You don't have to marry it, just put it on a spare machine or even in a VM.

The thing about NetBSD (and I can vouch for OpenBSD being much the same) is that it's so stinking well documented, if you go in knowing C you can learn how to write a simple device driver using just the man pages as your guide. Section 9 of the manual in particular documents every kernel API and most if not all internal types.

It's a great OS to practice developing low-level code with. Linux is a bit harder.


Well, continue classes and your studying. Have you implemented semaphores?


What does implementing semaphores have to with this?

When it comes to writing firmwares or an OS, nothing can substitute for hands-on experience with actual hardware.


Semaphores are important when adding support for SMT in an OS kernel.


Likewise, nails are important when building a house, but you do not learn how to build a house by studying nails.


this is a terrible analogy. I learned implementing sempahores when building a kernel. It's absolutely required to learn, a necessary step to building the "house"


Why is it that closed source drivers are the norm rather than the exception for hardware companies?

People need to buy your hardware to use your drivers.

Meaning driver development is a cost. The revenue comes from hardware sales. You're usually offering the driver to the public for free on your website anyway. Why not simply open source it?

If you open source your driver, community developers could potentially give you free help -- e.g. porting to Linux, getting into the mainline kernel, fixing bugs.


Because of two reasons

- driver can reveal internal workings of their “proprietary” hardware

- they use driver features to further differentiate their hardware product

But really it’s just old tradition and paranoia.


> - driver can reveal internal workings of their “proprietary” hardware

Every, somewhat solid, HW engineer from the competition will be able to still grasp those inner workings from a proprietary driver and the HW itself. In the end they cannot just copy (most) things over due to facing potential legal issues, no matter if the driver is foss or proprietary.

> - they use driver features to further differentiate their hardware product > But really it’s just old tradition and paranoia

Agreeing here, albeit AMD and Intel seem to have (somewhat) grasped that it doesn't need to be that way.


> Why is it that closed source drivers are the norm

Closed source is the norm, no matter if it's a hardware company's driver/companion app or a service company's client app. It's rarely economical, but usually "makes business sense" as in there's one more proprietary thing that you own and that increases your apparent value to investors. Some reasons I've heard:

- opening it would lower the barrier to entry for competitors (as if software is the big barrier) - it could be licensed to other companies in different industries (that almost never happens) - closed source is more secure and therefore less risky (a classic fallacy)


software is a gigantic barrier even when you put hardware next to it. maybe not for a highly sophisticated hardware like video cards, but generally, yes.


Hardware: concept, iteration on paper, CAD, source parts for prototype, build prototype, write firmware, iterate prototype, CAD, source parts for product, find a manufacturer for custom parts, final assembly, certification, packaging, logistics, shipping.

Software: hire 2 junior programmers, give them the hardware and protocol specs, wait 2 months, upload the executable to a website.

The vast majority of "software for hardware" these days is just a GUI sending commands to the firmware over whatever communication channel is available. GPU drivers are pretty much the single example of very complex "software for hardware" in the mainstream market.


Patents you have licensed, but do not own. And so your lawyers advise you not to release any source code.


And all the ones you didn't license too, because you don't want any patent trolls to find out you're using a technique vaguely similar to some patent you've never heard of. Aren't patents great?


While the revenue comes from Hardware, Driver is actually the centre of value. Just look at the history of Graphics Card ( even before the term GPU was born ), there are plenty of decent graphics hardware that fail to compete simply because of poor drivers.

Of course this also means there is potential for a competitor to gain competitive advantage when they dont have to rely on GPU for revenue and profits generation. I expect or hope that to be Intel. ( That is ignoring GPU's potential patents issues )


> While the revenue comes from Hardware, Driver is actually the centre of value.

And IME it is the proprietary drivers that usually offer least value and most problems. That sways buying decisions.


Then explain why everyone buys Nvidia cards for GPGPU?


Because GPGPU software stacks do a very poor job of supporting either of OpenCL or Vulkan Compute, which are the main free alternatives to CUDA. This is especially problematic in the ML space.


Yes that was my point which contradicts what foxbluff wrote >> And IME it is the proprietary drivers that usually offer least value and most problems <<


CUDA does not inherently offer "more value" than Vulkan Compute, though. It was simply developed earlier.


> Why is it that closed source drivers are the norm rather than the exception for hardware companies?

Because when you want to depreciate your old hardware and want to push your customers to new one, open source driver that is maintained, or even worse, add features that should be exclusive to your new hardware, you have a problem. Planned obsolescence is the name of the game in modern hardware industry. And even "open source" drivers like mwlwifi show similar pattern, good look fixing their ¢¢^√°^¢^° firmware.


Also when you depend on driver to differentiate between high/low end devices, open source driver is also no. Eg. Nvidia, throttling consumer devices in favour of enterprise grade ones.


Has this changed in recent years? Back in the days the consumer devices were faster but less reliable ( noone cares if two or three pixels are off in a 3d game). Cards based on the same hardware sold to industrial customers were just from better silicon bins run more conservatively. If you're running serious simulations you want accurate, reproducible results. Otoh, aerodynamic and seismic simulations can also be one or two "pixels off" and industry started to realize they can get faster equipment for cheaper by buying consumer grade...

This was all ~10 years ago, though, I think nowadays they just fleece everyone.


It’s a mindset issue

Sure, sometimes there is special sauce in the drivers too

But mostly it’s just culture. Hardware companies also don’t like cloud software…because they think on prem is safer or whatever


Any company that is going to negotiate with a cloud company, or that is going to compete with a cloud company, should avoid using any services from that cloud company. Otherwise you risk giving them access to strategy papers, or even internal communication about strategies or negotiation limts.

Cloud companies are some of the biggest customers of hardware companies like AMD. Eventually they will come to a point where they negotiate.


That's a pretty serious accusation. The reputation of things like O365 and Google docs/mail would be shattered if it turned out that MS or Google would be reading internal documents or emails of other companies for anti-competitive purposes.


> That's a pretty serious accusation.

It was not an accusation. A risk was articulated.

The sheer quantity of hardware purchased by these companies is relevant when assessing risk/reward.


Linux magic, mucking about in the kernel, M1 compatibility hacking, this writeup has got it all!

Unfortunately, having read this, I feel rather stupid.


you're probably smart at something else


> probably


As someone who never touches drivers or hardware directly, these sorts of stories are just wild to me, I love reading them. Congrats to Alyssa and the team; as usual, a very interesting write up and impressive work.


This is going to be great news for new MediaTek chromebooks coming out in June with 2016 MacBook level cpu performance. rk3399 was just barely usable with a lightweight window manager. I may even pick up one of the 8192s in the meantime if a distro like arch linux arm starts supporting it.

Great work from Alyssa and everyone else at Collabora.



I'm not excited about M1 at all, tbh. I'd rather have a corebooting chromebook, even with the performance differences.


mt8192 is already buyable for me, that said mt8195 is a significant bump over that: https://www.mediatek.com/products/products/tablets/mediatek-...

Hopefully that reaches retail quite soon.


And it's already time to start on the Mali-G610/710 for the RK3588.


I clicked the link to check out whether the article was written by Alyssa and - yep! Damn GPU wizard.

Thanks a lot to Alyssa, Collabora and everyone else involved for the amazing work!


This is amazing work. But what’s the motivation? I understand in 90s Linux was a niche and we had to write our own drivers. But today Linux runs on 2B devices - shouldn’t hw vendors to write their own drivers today? Why try to even support open source for non-co-operative hw companies - just stick with “good” hw companies?


By and large, HW vendors are writing hacked-together, barely-functional drivers that only work on a heavily patched "downstream" fork of a single version of the kernel code. This is for proper support that can actually be merged in the mainline kernel and maintained for the foreseeable future.


Why would anyone pay for it?


There's enough of a business demand for it. What surprises me is that this leads to (presumably) smaller customers paying Collabora to do it instead of the SoC manufacturer just providing a usable driver.


I imagine Collabora has embedded Linux customers who prefer to use an upstream kernel rather than some kind of Android blob kludge.


Because it gets rid of an especially nasty sort of technical debt. It makes that piece of hardware useful with reliable, up-to-date versions of the kernel.


> Why would anyone pay for it?

Because doing it properly is too expensive for average customer? Also, in capitalism you only need to outrun your competitors, not achieve some kind of perfect driver. So when your competitors are as &'££&-- as you, wasting time on a perfect driver is a waste of time.


I really enjoy Alyssa's writing style.


Does it mean we can have open source nvidia gpu driver as well?


GPUs that don't happen to share the silicon with the CPU like on most mobile SoCs are a lot more independent. In the case of NVidia that means integral parts like clocks are locked behind signed firmware and open source drivers are forever stuck running them in their "just enough to show a boot logo" mode.


Would it not be possible to distribute a program that downloads the proprietary Nvidia driver, extracts the firmware and loads it into the running kernel?

IIRC there was a Debian installer for Sun's JDK/JRE that did precisely that.


Nvidia's TOS forbids doing that or publishing any software that does that for others.


I am glad they didn't manage to acquire ARM. Worst company ever to be allowed such control over computing. Having to deal with their GPU drivers and CUDA blobs is bad enough already. Pure evil.


> Nvidia's TOS forbids doing that or publishing any software that does that for others.

IANAL, but just because TOS claims something doesn't mean it is enforcable in all jurisdictions, and could potentially fall against certain interoperability or other provisions.

I'll bite. Has anyone traced the initialization of the NVIDIA binary driver and figured out what is so special for reclocking and/or reproduced it without the binary driver? If you do not want to reply publicly email is in profile.

Good luck


> Has anyone traced the initialization of the NVIDIA binary driver and figured out what is so special for reclocking and/or reproduced it without the binary driver?

From what I understand [1], the chip itself will check if the running firmware has been signed by NVIDIA and refuse to run at higher clock speeds if not.

[1]: https://nouveau.freedesktop.org/


Exactly. Maxwell and later Nvidia architectures contain internal CPU cores that manage parts of the GPU and I believe even job scheduling. They refuse to run anything not signed by Nvidia, so developing an open source high performance driver would require obtaining their root keys. If someone does that and can't prove they did it by brute force or some other vulnerability in Nvidia's software, they're going to be in a world of hurt.


Yeah but my point was to develop a tool to extract NVIDIAs proprietary signed firmware, make nouveau capable of interfacing with said firmware, and distribute the extractor tool as part of a Linux distribution.

That way you'd have both high-speed acceleration and an as-far-as-possible open source driver.


Why should I as an author of a piece of downloader/extractor software be liable for what others do with my software? As the author, I'm not bound for EULAs I never agreed to.

Alternatively to downloads, I could also make a software that asks the user to supply the NVIDIA installer binary and extracts the firmware out of it.


It's amazing that such practices are even legal.


And if someone does it anyway, then what?


I imagine once they notice and decide to care, they sue you for copyright infringement or breaking a contract (I'm really not a lawyer; I know using a piece of software without a license is something you can sue over, but not details).


What contract? Do you have to sign one when you buy a GPU?


EULA; I know it's not a "sign on the dotted line" contract, but it still has some legal power, probably, in the USA at least. If you don't accept the EULA then you're "just" breaking copyright law, I think. (Again, IANAL, I know I'm not doing great at details, but I'm fairly sure it works roughly this way)


Were there any cases of a EULA being actually upheld in a court? I mean, a video card is something you buy to own. It's a device for highly parallelized number crunching. The manufacturer deliberately interferes with your ability to use the device you bought with your choice of operating system. It feels like such behavior should be illegal and NVidia would be at fault, not the person who has merely defeated their shitty lock-in tech.

Also you technically don't agree to a EULA when you download an installer via direct link and unpack it to extract the files required to initialize the thing you own. You never run the installer and never click the "agree" button.


Yes, particularly aggressive companies have had their EULAs upheld in court (in the US).

Circumventing the EULA prompt is considered agreement to the EULA terms in the US. If you fight that too much in court, it would then be considered software piracy.

The way things tend to be enforced is that while you own the physical piece of hardware, you do not own or necessarily have any rights to the software on it, or for it.


Thing is, you could usually reverse engineer a piece of hardware enough to write your own software for it, as described in the OP article. That's fully legal. But NVidia designed their hardware on purpose such that you're required to use their proprietary blob to initialize it and you can't substitute it with something open-source. So hardware is effectively useless unless you send it this particular Highly Proprietary™ sequence of bytes that NVidia won't let you have without way too many strings attached.

I guess one way would be to write a driver that does this exact thing this thread suggests, and release it under a license that forbids its distribution into and within the US. And of course never ever host any parts of it on any infrastructure located in the US or owned by a US company.


I mean, good luck. The internal CPU core you have to initialize will only execute signed code for the most part, so any open source high performance driver for the card would have to get a hold their root keys. And doing that is covered by quite a few international laws, so whoever did it would be pretty screwed.


Exactly. Maybe not a problem for personal use. A large open source project would draw their ire though.


The open source nouveau driver has existed for years, the problems with it are mainly caused by nvidia; not releasing signed firmware for use by nouveau, not releasing specifications etc.


What about writing a tool to extract that signed firmware from the proprietary drivers and just using it? I think with some additional reverse engineering it's probably possible to automate this, assuming NVIDIA will not start changing the drivers specifically to fight this.


People might actually start doing similar things if nVidia starts dropping support in their proprietary driver for cards that can't otherwise be usefully supported by nouveau. At that point, it'll be easy to rely on the observation that such actions are necessary for interoperability, hence justified as fair use. Not sure if we're at that point just yet.


Nvidia cards below the 800 series are unsupported by the 495 driver, which adds support for GBM, which is necessary for most Wayland compositors to have hardware acceleration. I'd say we're basically at that point.


The actual firmware issues seem to have started with the Maxwell series, which apparently still gets proper support. Nouveau might behave badly with lower-series cards, but not due to lack of signed firmware.


what if you wanted to use linux with an nvidia card on a cpu that their driver does not support (e.g. risc)? That would presumably also fulfill the interoperability requirement, right?


As far as I know nouveau is not limited by reverse engineering respurces as much as actual security measures put in place by Nvidia. IIRC you need some kind of key to clock the card up to a reasonable frequency.


What an asshole company. I think they have displaced Oracle from the number one spot. I wish the people who thought this idea of locked hardware to have a terrible diarrhea.


Opensource nvidia driver is called nouveau and it already exists. What are you really asking?


The nouveau is buggy and has poor performance, sometimes getting blacklisted by various software as well because of this (Example: Chromium - https://news.ycombinator.com/item?id=18834715). It doesn't help that NVIDIA is basically doing nothing to help the nouveau team to do a better job, and that their own driver is also buggy and has poor performance.


So fun to read!

Question wrt:

> If Linux doesn’t know a clock or power domain is used by the GPU, it’ll turn off the GPU inadvertently.

If you know the name and ID of the GPU, is there a command from the terminal to tell Linux to turn the thing back on?


if you're interested in this you might be interested in https://github.com/jbush001/NyuziProcessor


How the fuck?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: