> Does not work if the OS uses the IOMMU/VT-d. This is the default on macOS (unless disabled in recovery mode). Windows 10 with Virtualization based security features enabled does not work fully - this is however not the default setting in Windows 10 or Linux.
For PCI (shared bus) the device could just spoof packets, but for PCIe, there are dedicated lanes for each device. I wonder if a physical MITM device is practical.
Roughly 10 years ago, I knew a guy who sold various exfil/border-extraction products to state actors including FireWire dongles similar to Inception (they may even be effectively the same thing, IDK). He specialized in memory dumping attacks on several platforms including Windows, FreeBSD, Linux and macOS.
The core problem is that buses should be authenticated, authorized, encrypted, and selective-/least-privileged channels. Exposing a memory and expansion bus to the outside world in the name of "convenience" is insane. A trusted set of components in the OS and in hardware should:
0. Be able to do a HwIDS checksumming of all firmwares to detect tampering.
1. Limit devices' ability to connect unless they are authorized by the user, much like a "hardware firewall" UI.. vaguely similar to say VMware Workstation/Fusion's dialog when plugging in a new USB device mixed with something like Little Snitch'es dialog for a process wanting to connect to a particular port.
2. Authenticate devices with public/private key certs that are burned in, a function where the device can answer challenge requests, and Signal protocol-like construction properly modified for PKI. Then, and only then, can a host talk securely to a device over an encrypted channel.
We never sold anything for it, but for some demos back around 2004/2005 we flashed some firewire iPods with custom firmware that performed the attack described here:
(To be clear, those slides aren't mine, but I can no longer find the firmware we based ours off of, and never did get permission to post source to our mods, which were never distributed.)
I know a couple of our customers took entirely the wrong lesson from our demonstrations and banned mp3 players from their office buildings entirely afterwards :)
Many moons ago around 1998 in uni, I had an HP 48 graphing calculator which was both programmable and had a very powerful IR LED that was usually used for serial transfers between similar devices. It so happened that some enterprising soul made a customizable, preset and code-learning IR remote control app for it that worked, so I put the brand of the lecture hall's 4 TVs in it. From the back of the room, some 15 meters away, I subtly turned on all of the TVs with a phreak'n calculator. Disbelief, confusion and hilarity ensued.
Woah, relax there before you have a stroke. You're over-reacting and jumping to erroneous conclusions while fetishing and trivializing gross insecurity.
Open standards are obvious and essential, and as such a CA for one doesn't have to be owned by any corporation, and are best run by a real non-profit that is independent of governments and bias towards any particular vendor. The hardware manufactures would pay a nominal fee to have a perpetual certificate and millions/billions of people get devices that can't be snooped on from the outside by authoritarian regimes.
The problem is that, used properly, IOMMUs are horribly expensive.
Consider a NIC driver where you're mapping an outgoing packet for DMA. What used to essentially be a virtual to physical translation becomes a virt to phys + entering the phys in the iommu + removing the mapping when the transmit is complete. This is expensive for hardware and software reasons. At one point I benchmarked a 100g setup on linux, and with the IOMMU enabled, we lost about 90% of the bandwidth and most of the CPU time was spent in lock contention over the red-black tree that managed the IOMMU tables. This was 5-ish years ago, so perhaps things have gotten better.
So that makes people want to just enable the IOMMU for SR-IOV (and full device) pass-thru to VMs. This is cheaper, since you just set the mapping up when you allocate phys mem for the guest, and tear them down when freeing phys mem.
MacOS used to use a really cool trick where they pre-mapped all mbufs into the IOMMU. That made network traffic transmit and receive comparatively fast. However,it also prevented lots of optimizations that modern operating systems use for zero-copy IO (like attaching pages from sendfile directly to mbufs, similer to skb_frags).
A problem on the hardware side is that Intel's IOMMU TLB is tiny (64 entries), so using huge pages for all DMA-accessible memory is absolutely required to get a good performance out of it.
Nice paper, and thanks for the reminder of how small the IOMMU tlb is. We never hit this because we were testing full-sized packets (and really bigger, because of TSO) and hit host IOMMU management overheads at ~100K to 200k TSO sends/sec.
I think ~100k to 200k TSO "packets" per second should be doable with the IOMMU. But I guess it depends where the data is coming from. Could be one of the odd cases where copying data is faster than doing zero-copy, e.g., just copy everything into the same small set of small-ish buffers to keep the number of pages that need to be present in the IOMMU small?
This sounds like a driver bug / misconfiguration. Once the IOMMU is set up, there isn’t any host side management to be done (other than managing the IOMMU TLB, but unless it’s thrashing, that’s a no-op on the fast path).
At most, each kernel driver has to do an extra addition to map its physical I/O offset to the one exposed to the bus by the IOMMU. With huge pages, there’s approximately one offset per driver, so it lives in cache, probably next to other driver state.
FWIW: 64 entries isn't particularly small for a dTLB, IIRC that's exactly the size on current Intel cores. The real problem is that device DMA, unlike software behavior, is distressingly non-local. The device will stream out a packet or storage block and then never touch that memory again (or not for a very long time -- memory buffers are huge relative to bandwidth on these things). The TLB just doesn't do you much good.
There's only one level of TLBs in the IOMMU. And that's 64 entries.
Yeah, I think the dTLB is only 64 entries on Intel CPUs as well, but there's a second larger layer behind that, and an even larger third layer. IIRC it's a total of 4096 entries on recent Intel CPUs.
Ideally, one would only need to do this level of IOMMU for externally attached devices (i.e. Thunderbolt.) However, the problem is that the vast majority of kernel drivers in most OSes were not written with malicious devices in mind, so they often do things like e.g. putting kernel address space pointers in device-writeable areas.
For instance, XHCI (USB3) has an "address" field in its buffer descriptor structures that often it allows drivers to put anything into, and it will copy that into the "transfer finished" message. Some USB3 drivers just put the kernel address space pointer of the bookkeeping record in there; or worse, allocate the bookkeeping records along with the physical buffers themselves, so a malicious device could manipulate those if it knew where to look.
Does the attack in its current form work over Thunderbolt? I'd really like to mount my RAM as a drive, and see the files that various programs have open (the waveform from Rogue Amoeba's Fission, among others). Using gore just gave me a 2GB raw file, and it's hard for me to spot files (JPG, PNG) in there.
I think the most expensive isn't the address translation itself but TLB housekeeping when mappings get remapped or invalidated. This is especially true with virtualization where the hypervisor often needs to do extra work like (un)pinning guest pages, translating guest real addresses to host real and reissuing TLB flushes.
> At one point I benchmarked a 100g setup on linux
Datacenters have controlled physical access. I think IOMMU is far more important for anything with exposed thunderbolt ports (including the upcoming USB4). So Laptops, smartphones, workstations can still benefit from it even if it's currently not viable for cloud server-class workloads.
Intel used IOMMU support in their CPUs for product segmentation until Thunderbolt came around. It's now pretty widely supported in hardware, but seldom enabled by default at both the motherboard firmware level and OS level.
Yep, they're also pretty popular with people who cheat in ESEA/FaceIT third-party ladders. One of the guys, ra1f, eventually outed himself [1]. The software-based anticheat is fairly decent, pushing people to hardware-based cheats built off of PCILeech to avoid detection.
Does DMA over PCIe work using USB gadget mode with a Linux device? i.e. could a Pi be used easily and inexpensively to build an acquisition device for this?
with a pcie adapter connected over what looked like USB3 and forgot that it's thunderbolt on the macbook. I was not quite to the middle of my first cup of coffee when I asked that.
with a pcie adapter connected over what looked like USB3 and forgot that it's thunderbolt on the macbook. I was not quite to the middle of my first cup of coffee when I asked that.
A few of the better anti-cheat products have had detection vectors for these for a while now, from simple things like detecting the driver to outright probing the device.
Introducing the Memory Process File System for PCILeech http://blog.frizk.net/2018/03/memory-process-file-system.htm...
Using your BMC as a DMA device: Plugging PCILeech to HPE ILO 4 https://www.synacktiv.com/posts/exploit/using-your-bmc-as-a-...