MIG virtualization is IMHO weak sauce. Only seven slices. Seven? Extremely limited hardware support. Difficult to configure - like the early days of CUDA. It’s been in the works for what 7 years now and barely functional.
Meanwhile, don’t forget that if your workloads are cooperative, you can put all the processes you want on a single GPU and they’ll happily multitask. No security boundary of course, but who knows how good MIG is at that.
I’d greatly prefer better tools for cooperative GPU sharing like per process memory limits or compute priority levels. Also seems like it should be way easier to implement. As containerization and k8 have proven, there’s a ton of utility in bin packing your own workloads better without rock solid security boundaries.
I know several HPC sites that use it: they (e.g.) ordered cookie-cutter server designs/models to simplify logistics, but not all of their users need the complete capabilities, and so they slice/dice some portion into smaller instances for smaller jobs.
At some point the slices because so small that they stop being useful. An A100 can have as 'little' as 40G of memory, and you're now down to 5G per instance:
It's a reasonable argument that you'd only need it at the top-end of the hardware: the number of workloads that need all that compute and memory are not that common, so downshifting some hardware to resource slices that are more typical is not crazy. Of course you then upshift when needed: but if you had purchased 'smaller' cards because that's what you thought you (initially) needed, then you're stuck at that level. There's no way for you to upshift/de-downshift.
> Difficult to configure - like the early days of CUDA.
How hard is it to run nvidia-smi?
> Meanwhile, don’t forget that if your workloads are cooperative, you can put all the processes you want on a single GPU and they’ll happily multitask. No security boundary of course, but who knows how good MIG is at that.
The security boundary of MIG is lot better than MPS, which basically has no security. I know several folks running HPC clusters that use it to isolate the Slurm workloads of different users. And my search-fu has found no CVEs or published papers jailbreaking out of MIG instances.
> I’d greatly prefer better tools for cooperative GPU sharing like per process memory limits or compute priority levels. Also seems like it should be way easier to implement.
Contra another comment: fairly low. (Or at least my search-fu has not been able to find any CVEs or published papers about breaking isolation between MIG instances. MPS should be generally be used only by one user so multiple of their own CUDA apps can attach to one (v)GPU.)
MIG is used a lot in HPC and multi-tenancy cloud, where isolation is important. See Figure 1 and §6.2:
The card is actually sliced into different instances (show up as different /dev/nvidiaXs), each with their own SMs, L2, and DRAM, that are isolated between each one. (MPS is for the same user to share a GPU instance: allows multiple CUDA apps to attach and time-slicing occurs.)
I remember a few years ago my hardware security professor suggested we try to implement Rowhammer on GPU. I ended up doing something else, but it looks like someone got there: https://arxiv.org/abs/2507.08166
MPS should only be used where all the workloads trust each other. It is similar to running multiple games on your computer simultaneously.
You cannot use NVLink with MPS or MIG, it is not isolated, and malformed NVLink messages can be authored in userspace and can crash the whole GPU. Some vendors, like Modal, allow you to request NVLink'd shared GPUs anyway.
MIG only makes sense for cloud providers. MPS only makes sense for interactive (read: not ML) workloads. Workloads needing more than 1 GPU cannot use either.
I do not see MIG mentioned in either paper. I do not think the papers are examining isolation security between instances, which the GP was asking about.
As per sibling comment, this is about utilization efficiency and not breaking isolation (between MIG instances). The conclusion:
> In this paper, we presented MISO, a technique to leverage the MIG
functionality on NVIDIA A100 GPUs to dynamically partition GPU
resources among co-located jobs. MISO deploys a learning-based
method to quickly find the optimal MIG partition for a given job
mix running in MPS. MISO is evaluated using a variety of deep
learning workloads and achieves an average job completion time
that is lower than the unpartitioned GPU scheme by 49% and is
within 10% of the Oracle technique.
* https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
* https://www.nvidia.com/en-us/technologies/multi-instance-gpu...
Or having multiple processes from one user share it:
* https://docs.nvidia.com/deploy/mps/index.html