Hacker News new | past | comments | ask | show | jobs | submit login
VirGL – A virtual 3D GPU for use inside QEMU virtual machines (mesa3d.org)
243 points by skibz on Aug 13, 2023 | hide | past | favorite | 81 comments



Note also: it just got an experimental Windows driver!

[1]https://github.com/virtio-win/kvm-guest-drivers-windows/pull...


> disables preemption systemwide

I guess that means not usable day-to-day. One infinite loop in any app and the whole OS will freeze forever. I didn't even know any modern OS could operate only with cooperative multitasking. Windows 3.1 in 1992 and PowerPC Mac OS until 2002 were the last mainstream OS's to use it...


I believe he’s referring to GPU preemption. I don’t think it’s even possible to disable cpu preemption on windows. On nix you can.


For decades no one used "nix" to mean Unix and Unix-like OSes (they used "*nix" and occasionally "Lunix") then a package manager named Nix became important and now people confusingly start using "nix" to mean Unix and Unix-like.


Interesting info. It seemed like a reasonable common one to me over the past decade on the web.


I can’t really blame you, it’s a perfectly reasonable concatenation imo. Pretty sure I’ve at least rarely seen it over the years as well.

That said, he is correct that now “nix” is going to likely be popularly associated (especially with tech types that love excessive things) with the Nix language, package manager, and NixOS, which has really been gaining steam. Especially these past few years.

It was what I first thought of as well, given I use them.


...And of course after the edit window is over, it occurs to me I got my wires crossed. "Concatenation" is adding two strings/words.

What I meant was "contraction".


What’s the performance like on this?


Imagine still using Windows in 2023


VirGL is a poor solution to the pressing problem of virtualized graphics. It only really exists because the hardware makers AMD/Intel/Nvidia in their infinite greed refuse to support VFIO on all GPU's like how IOMMU is supported on nearly all CPU's.


That's 100% fair. Good thing it's not too difficult to assign VFIO w/i QEMU for virtual machines despite the manufacturer shenanigans. :)

The Arch wiki has a great guide here - https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM...

It does get a little tricky if your GPUs are identical, but I've done this for years and maintain a guide for doing this (as well as the ACS-override patched kernel RPMs) for Fedora.

- Writeup - https://some-natalie.dev/blog/fedora-acs-override/

- Code + RPMs - https://github.com/some-natalie/fedora-acs-override

As far as concerns around stability with ACS override, I tend to only enable the override for the specific GPU (or other hardware) that I'm passing through and haven't encountered any stability problems or memory leaks that'd interrupt desktop or light server usage. I also used to run this for a bunch of white-box GPU hardware for a customer at a former job and it worked well for exploratory AI/ML workloads before investing in the big Nvidia DGX boxes. YMMV, of course!


Is there a reason an emulator like qemu couldn't just pass along spirv like wasm does with webgpu? Would that be way slower?


A better way [for me, anyways] is getting GRID drivers running. However, this only works with the 9xx cards up to the 2080.

https://gitlab.com/polloloco/vgpu-proxmox


It's not difficult but it misses the point. SIOV supports 1k's of VF's because that's what you need if you want a sandboxed app-per-VM security model. When statically compiled VM's are just as performant as containers but more secure.


Why is this a requirement? I thought Linux takes over after it loads.

  Your guest GPU ROM must support UEFI"


Correct me if I am wrong on this, but to me, it would seem that something like VirGL would still serve a purpose with wider spreader full SR-IOV support on consumer GPUs, as VirGL could find application in many scenarios where a GPU vendor's drivers are not compatible with the guest.

Saying that, I do agree that vendors should enable support in customer GPUs and feel that their focus on protecting server sales is going to turn out misguided in the long term. Intel especially disappointed in this area, as they in the past did allow such functionality on their GPUs, but have recently removed that.

AMDs mainstream CPUs supporting server features such as ECC also have proven that such restrictions aren't necessary, and allowing this type of capability on mainstream platforms in no way harms enterprise sales.

That being said, any effort focused on GPU virtualization or drivers impresses me immensly and I very much appreciate the work done on VirGL.


>Correct me if I am wrong on this, but to me, it would seem that something like VirGL would still serve a purpose with wider spreader full SR-IOV support on consumer GPUs, as VirGL could find application in many scenarios where a GPU vendor's drivers are not compatible with the guest.

Pretty much every guest OS (windows, Linux, BSD) has drivers that would work with a native PCIe VF GPU device. MacOS still has AMD drivers but only up to RDNA 2. I can't think of any guest that would support a GL device but not have a native driver.

>Saying that, I do agree that vendors should enable support in customer GPUs and feel that their focus on protecting server sales is going to turn out misguided in the long term. Intel especially disappointed in this area, as they in the past did allow such functionality on their GPUs, but have recently removed that.

Intel supports SRIOV/SIOV on consumer CPU iGPU's (Xe, 11th, 12th, and 13th gen) but not dGPU's (A770, A750..) which is very frustrating. Indeed 'enterprise features' such as ECC or IOMMU on consumer chips have not affected server sales.

>That being said, any effort focused on GPU virtualization or drivers impresses me immensly and I very much appreciate the work done on VirGL.

agreed


GPU Virtualization: life's toughest challenge ;)

GVT-g high-level design

https://projectacrn.github.io/1.6/developer-guides/hld/hld-A...


GVT-g was intel's first crack at virtualization and is now abandoned. It was supplanted by Intel supporting SR-IOV which itself was succeeded by intel's SIOV.

https://www.intel.com/content/www/us/en/developer/articles/t...


Yeah, I would love to have api layer pass through as well as pcie layer pass through. API pass through would work well for things like containers or sandbox environments.


AMD supports VFIO on most of their cards. All of the RDNA based cards support it. Even some pre-RDNA ones too, and with a recent-ish driver NVIDIA's Geforce line supports it too without hacks.

Problem isn't really HW support, its that the software side is super glitchy and its not all that easy to configure and in most cases requires a 2nd GPU if you still want basic host functions.

Where as VirGL is much easier to get working and doesn't require specific HW support as far as I'm aware.


You are likely confusing what is better termed "PCIe passthrough" with virtual functions wherein one physical GPU presents itself on the PCIe bus as (dozen's in the case of SR-IOV or thousands in the case of SIOV) of GPU functions which can be passed to dozens or thousands of GPU enabled VM's.

https://events19.linuxfoundation.cn/wp-content/uploads/2017/...


Unfortunately AMD cards suffer from a reset bug, still, when used with passthrough.

The reset bug being that you can pass through the card fine, once. But if you try to pass it through again (or the card experiences an issue and needs to reset), they get caught in some kind of bad state and won’t work until power is removed and restored. Which requires a reboot or a only slightly less disruptive dance with system power states.

For vega and 5000 series gpu’s, there’s https://github.com/gnif/vendor-reset

Incidentally, nvidia gpus are so good at resetting, they’ve probably done so without you noticing. If the screen ever goes black for a fraction of a second and returns in normal usage, it was probably because it reset itself.

The lower 6000 series lower than the 6800’s for example may or may not have the issue. It seems most “reference” cards are fine, but custom vendor cards often but not always have issues. My reference 6700 works fine, but a sapphire 6700 probably won’t.

And the 7000 series is also fucky in a new way somehow. Gnif knows far more about this than me, and has basically thrown up his hands at how AMD doesn’t care. He’s made occasional posts about it on https://forum.level1techs.com/

Gnif is also responsible for Looking glass: https://github.com/gnif/LookingGlass

When it comes to splitting a gpu into virtual ones with SR-IOV/MxGPU, that’s not really a thing with AMD. Whereas Nvidia will happily give you what you want if you shovel some money into their gaping maw, AMD won’t even give business customers the time if day if you aren’t worth billions. They very deliberately do not want the unwashed masses to use MxGPU. See: https://www.reddit.com/r/VFIO/comments/eqvn9z/amd_mxgpu_or_s... for a summary of the years of hopelessness on this functionality.


You could also be in a fun grey area where your 6600XT will sometimes reset properly and allow the physical host to reclaim it for its own purposes. Or not and require a full forced shutdown and reboot to restore proper function.

Let's just say I'm very aware of AMD's issues in this area :P

Also looking glass is pretty great though it's use of the Desktop Duplication API seemed to carry with it a huge performance hit. Or rather did the last time I used it (it's been a while)


Isn't it dangerous to give a guest direct access to a GPU?


It's not really to different to giving your web browser access to your GPU (and by extension to random websites using WebGL). So yes, it's dangerous. But it is at least a threat which designers of GPUs are already considering. Although there have been interesting bugs where GPU memory hasn't been zeroed before allocating it to a new context and you could read previously written graphics memory to find secrets.

As long as 1 GL context on the guest side == 1 GL context on the host side then it _should_ at least be as safe as letting your web browser access your GPU but certainly not as safe as using an IOMMU to segregate a whole GPU solely for your VM.


I feel like a good half of machines I find with glitchy graphics drivers seem to show bits of textures from one application inside another application - indicating memory contents leakage between contexts. Chunks of webpages from Chrome appearing in 3D games seems common.

And those are accidentally caused leaks. As soon as someone starts storing actually sensitive data in graphics memory, I'm sure lots of methods to deliberately cause leaks will be found.


I've had a more extreme case in a dual boot configuration of some graphics corruption on my Linux desktop exposing a mirrored and discolored frame of my prior Windows desktop from before rebooting.


Security isolation isn't the only possible goal of virtualization.


It's not dangerous because it needs IOMMU support, the GPU can only access the memory space of the guest.


You mean SR-IOV?


There is a project I am following to embed a Linux machine into mobile VR applications using virtio-gpu & his latest effort gets 200fps.

"VR Linux is now testing with working GPU acceleration via the kernel over virtio-gpu. This is a major step forward following on from the Shadertoy device that we made a few months ago. Now we can make OpenGLES Linux programs that are accelerated by the Quest's Adreno GPU."

[1] Thread: https://twitter.com/anjin_games/status/1371870094490537987?t...


VirGL is definitely an interesting project, but all one has to do to get GPU passthrough working (from a Linux QEMU host to any guest OS) is: 1.) research a cheap, secondary GPU that is natively supported by the guest OS, 2.) plug such a secondary GPU into a PCIe slot on the host and hook it up to the primary monitor with a secondary cable (D-Sub vs. DVI, etc.), 3.) setup Linux to ignore the secondary GPU at boot and configure a QEMU VM for the GPU passthrough. The whole process takes perhaps [edit: "a few"] hours and as works flawlessly, with no stability issues [edit: "at least with Asus motherboards"]. (Switching across the two GPU cables can be accomplished in software by using Display Data Channel /DDC/ utilities and switching keyboard/mouse can be accomplished by using evdev /event device/ passthrough.) More information: https://github.com/kholia/OSX-KVM/blob/master/notes.md#gpu-p...


> plug such a secondary GPU into a PCIe slot on the host

Where can I find an unused PCIe slot on my laptop?

Also it is not very rational to buy 2 GPUs and use only one at a time.


All gaming laptops come with a secondary gpu. What i do on mine is a disable the dedicated gpu, enable passthrough, and voila - shitty windows can run virtualised with a dedicated gpu. Takes 5-10 mins to do so. Also if your laptop supports thunderbolt or usb4 then all you need is an egpu. But thats the more pricey solution.

Having said that virgl is pretty darn sweet!


I have an AMD 290 or whatever GPU to run NixOS (will be upgrading to Intel soon) and a 3060 that I pass into a Windows VM for gaming, I feel very rational.

Unused PCIe slots in a laptop is hard, haven't tried but I imagine Thunderbolt could work for this purpose.


Except that requires IOMMU which is not always available nor is it always reliable on consumer motherboards.


Good point, but I believe that was a serious problem 10 years ago, while these days virtually any decent motherboard properly supports IOMMU, consumer-grade boards included - e.g. any Asus motherboard should work perfectly.


They might support IOMMU, but the default groups can be very annoying to work with so you have to use a patched kernel as well that "ignores" the actual groups.


IOMMU is a integrated CPU feature, so consumer motherboards do not affect reliability - they just make the setting available.

IOMMU is also required for various modern security features, and M$ requires it for certification nowadays to protect against DMA vulnerabilities (heard of thunderbolt?).


No.

Just because your CPU supports IOMMU, that does not mean GPU passthrough is going to work properly or that groups are setup properly or....


It doesn't, but if you want to do this you might wanna consider not buying the cheapest motherboard either way.


And we've come full circle :)


See https://looking-glass.io/ to get around directly connecting monitors

But forwarding real GPUs limits the number of VMs and can cause stability issues if unlucky - PCIe device bugs, especially reset bugs, are not unusual. Had problems with e.g. a forwarded rx580 that would require a hard reboot of the host to fix...

Things like Intel GVT-g and VirGL are better solutions, when they can be used.


> See https://looking-glass.io/ to get around directly connecting monitors

Interesting, thanks for the link!

> Had problems with e.g. a forwarded rx580

I've been forwarding Sapphire Radeon RX 580 Pulse to both Windows and macOS for literally years and, except for the specific problem of host sleep/wake, had no problems whatsoever. Perhaps try an Asus motherboard?

> Things like Intel GVT-g and VirGL are better solutions, when they can be used.

Sure, when software solutions are available and stable (and there's no need for near-native GPU performance), they are definitely easier to work with. However, as of today, GPU passthrough is probably the only solution available for a daily driver.


I’m not sure what you mean by forwarding, but if you mean regular gpu passthrough and the reset bugs with amd gpus, then for the rx580 vendor-reset should typically work: https://github.com/gnif/vendor-reset

Does not work with the lower 6000 (below 6800) and 7000 series that can also have reset issues.


> The whole process takes perhaps one or two hours and as works flawlessly, with no stability issues. Good joke, that really made me laugh :)

I tried this on my ASRock X370 Taichi a while back. Turns out that there is a bug in older bios versions and the whole thing just freezes when starting the QEMU VM. Then there is an intermediate bios version with which I actually managed to get it working. Unforunately I later upgraded my CPU and had to install a new bios and this again completely breaks the IOMMU groups. Probably spend a few days to get everything running, including downgrading from a non-downgradeable bios version.

And even when it was working it was a pain to use. Want to use the passthrough GPU in Linux? Now I have to dual-boot QEMU VMs or disable the passthrough, reboot, then enable it again, reboot one more...

I really want proper GPU virtualisation...


> Good joke, that really made me laugh :)

I've been forwarding an AMD GPU to both Windows and macOS for literally years across multiple Asus motherboards and, except for the specific problem of host sleep/wake, had no problems whatsoever, even considering I work under GPU-passthrough VMs whole-day, every-day. Perhaps try a recent Asus motherboard?

> have to dual-boot QEMU VMs or disable the passthrough

Yes, you would have to buy as many cheap, secondary GPUs as the number of virtual machines that you want to run in parallel.

> I really want proper GPU virtualisation...

Sure, I don't blame you - my point was that the only truly usable GPU virtualization solution available today is GPU passthrough and that GPU passthrough is much easier to setup than it is commonly perceived.


> Perhaps try a recent Asus motherboard

And if I don't want to or can't afford to buy new hardware?

> Sure, I don't blame you - my point was that the only truly usable GPU virtualization solution available today is GPU passthrough and that GPU passthrough is much easier to setup than it is commonly perceived.

Okay, but for the poster you're replying to it is not available on their hardware.


> Perhaps try a recent Asus motherboard?

I also have an AM4 ASUS board (is that recent enough?) and earlier this year, ASUS decided to completely remove any mention of this board from their site, as if it never existed. So no bios updates for me I guess? No idea if it is up to date or not or if my CPU is even supported...

> Yes, you would have to buy as many cheap, secondary GPUs as the number of virtual machines that you want to run in parallel.

Except that cheap GPUs are... cheap and not very powerful, so depending on what I want to do I would have to buy a bunch of expensive and powerful GPUs (or go back to rebooting VMs to switch). And there are only so many PCIe slots on my board (2x8 and 1x4, the latter of which is already in use by a non-GPU card). Running them in an x1 slot also doesn't sound like a great idea.

I have tried to run it but gave up in the end after breaking bios updates and not wanting to spend even more money on another fast GPU.


> earlier this year, ASUS decided to completely remove any mention of this board from their site, as if it never existed.

Maybe your board was only disappeared on some locales? Maybe see if you can find it on asus.cn or one of their other regional sites (translation service required, but you can probably muddle through)

I haven't seen that in a long time, amd640 super7 chipsets got disappeared, but back then you could still get the bios updates via ftp, the boards just dropped off the website.


> I (...) have an AM4 ASUS board

No, I never tried GPU passthrough with an AMD CPU, Intel only.


There's also virgl_test_server which runs the virgl protocol over a socket.

This can be used for OpenGL inside containers, by bind mount-ing the socket into the container (/tmp/.virgl_test).

Also useful for debugging with rr, even with the nvidia driver you use OpenGL over virgl with the environment vars __GLX_VENDOR_LIBRARY_NAME=mesa LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=virpipe


I've never been able to get this to work under UTM on an ARM Mac; if anyone knows the magic qemu and VM settings, please drop me an email (email in profile).

It would be nice to have fullscreen, full res 3d accelerated VM guests finally. I've never been able to get this to work as smoothly as I think it should.


I've always found it interesting, and a bit unfortunate, that it seems like things like apitrace are generally pretty reliable, but tools like VirGL seem to often have trouble with typical programs. It seems like the problem of accurately serializing what is being done to the GPU using an API is possible, but in practice there is clearly a lot more to it than that. (Of course, in case of remoting, you do not have the benefit of observing the interactions with a known-good driver and with direct memory mapping/etc. So, that probably answers the "why" right there.)

It's been a while, though. Maybe this works really well... I'm curious to hear if anyone has more recent experience messing with these things.


Very interesting; is there anything similar for Vulkan? Does this also work with Vulkan?


It's called Venus: https://docs.mesa3d.org/drivers/venus.html

But not sure how far along it is. virt-manager doesn't seem to be aware of it yet.


ChromeOS uses it for Steam support which is generally available now, as well as for Android apps, so it must work reasonably well: https://chromeos.dev/en/posts/improving-vulkan-availability-...


Why does ChromeOS need virtualization for Steam in general?


Security, defense in depth.


Gfxstream, used by the android emulator and others.


Vulkan doesn’t come into play here. Vulkan is a graphics API. This project is hardware GPU virtualization in QEMU / libvirt.


That's incorrect, the goal of VirGL is pass the GL commands themselves (Gallium intermediate representation of them, actually), from the guest to the host, so they are executed on the host accelerated Mesa stack. It is not about virtualising the GPU hardware itself.

So it makes sense to one day do the same for Vulkan, which is much lower level and might be able to extract more performance.


I'm wondering if this approach could mean we'd one day be able to have CUDA running on non-Nvidia GPUs.


> Out of scope: Passing through GPUs or subsets of GPU capabilities.

Perhaps, but that should probably be a different project. They want a different focus here.


I tried to use it 1.5years ago when playing CSGO inside a VM. It worked, but the delay (between mouse movements and changes on the screen) was quickly raising to something like 0.5-1s.

It required some host daemon to be cooperating with qemu back then (virgldeamonsthsth).


Very interesting!

Now we need to make a translator for the DOS version of 3Dfx Glide 1 & 2 and drivers for Windows 95, 98 and 2000 ^_^

I'm getting on with the 3DFx one this weekend if all goes well.


For context,

The recent moves of Apple hypervisor being able to use Rosetta on intel cpu code it its already translated(Qemu does cpu intel translation) means that one can use Qemu to simulate x86 64 on Apple Silicon....


Its virtual... but passthrough, I would like to see something thats virtual but runs on pure cpu x86 or arm, do not care how slow.


See LLVMpipe a little ways up the list in the left hand pane. It should do exactly what you're looking for.

It's worth noting the terminology "passthrough" means something else in this particular context and VirGL is not considered a passthrough solution (as noted in the "out of scope" section). VirGL is a form of paravirtualization often called "API remoting".


Yes, llvmpipe or lavapipe is awesome. It is fully Vulkan 1.3 compliant and the performance is more than adequate for GUI apps, simple 3d apps or running your gfx tests in a CI environment without a GPU.


wow llvmpipe looks like what i need. I still need to figure a way to lie to some other bits that probe the hardware but this is going to be helpful

But thanks exDM69, zamadatix, fayalalebrun


On Windows, there's a pure software implementation of Direct3D 11 called WARP https://learn.microsoft.com/en-us/windows/win32/direct3darti...


There was a third-party one around DirectX 9 with a shiny car demo. It later became the foundations of ANGLE, I think (present day SwiftShader implements the Vulkan API): https://developer.chrome.com/blog/swiftshader-brings-softwar...


Doesn't using a software gl driver like llvmpipe do the trick?


There is swiftshader for that, it works very well


uh, this doesn't look like what those gpu passthrough things like vfio do, is it a different thing? Or I am more interested in, can this do what vfio do, run games in almost native performance in windows guest vm?


> run games in almost native performance in windows guest vm?

No, vfio is still the most recommended option.

This is essentially something like VirtualBox or VMWare 3D acceleration - the host is still the "owner" of the GPU, and has to juggle receiving OpenGL commands from the guest and sending the render results back to it, with lots of overhead. But easier for users than setting up passthrough or using proprietary GPU virtualization solutions.

People pointed out that there is a similar project for Vulkan, already being used in production for ChromeOS, called Venus. Should be the one to watch nowadays.


Virgl just sends opengl commands to the host. And it also has to copy some buffers back and forth. So it won't be as fast as vfio. But, there is a new feature called VirtGPU DRM native contexts which eliminates some of this overhead and run at near native speeds. Unfortunately it's only implemented for adreno/msm GPUs for now.I think there is work on the way for Intel and AMD GPUs.


How does Mesa3D get funding? It seems to be doing a fair bit.


It's been a while since I've really looked, but I seem to remember that the major contributors are either employed by various hardware/software vendors (Google, Valve, Intel, AMD, VMWare) or do contract/consulting work on the Linux graphics stack (LunarG), not unlike the Linux kernel.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: