PCILeech

floatingatoll · on Feb 25, 2020

Two related links previously on HN (no comments):

Introducing the Memory Process File System for PCILeech http://blog.frizk.net/2018/03/memory-process-file-system.htm...

Using your BMC as a DMA device: Plugging PCILeech to HPE ILO 4 https://www.synacktiv.com/posts/exploit/using-your-bmc-as-a-...

loeg · on Feb 25, 2020

And from the other side of things, securing the host OS from devices, I found this article a few years ago interesting:

Fuzzing PCI express: security in plaintext (2017) https://cloud.google.com/blog/products/gcp/fuzzing-pci-expre...

blibble · on Feb 25, 2020

I've had intel_iommu=on in my boot cmdline on my personal machines (ones I do my banking on!) for 5 years without issue

(board+CPU for all machines both support it, dmesg confirms it is indeed on)

hedora · on Feb 26, 2020

From the article:

> Does not work if the OS uses the IOMMU/VT-d. This is the default on macOS (unless disabled in recovery mode). Windows 10 with Virtualization based security features enabled does not work fully - this is however not the default setting in Windows 10 or Linux.

For PCI (shared bus) the device could just spoof packets, but for PCIe, there are dedicated lanes for each device. I wonder if a physical MITM device is practical.

anonsivalley652 · on Feb 25, 2020

Roughly 10 years ago, I knew a guy who sold various exfil/border-extraction products to state actors including FireWire dongles similar to Inception (they may even be effectively the same thing, IDK). He specialized in memory dumping attacks on several platforms including Windows, FreeBSD, Linux and macOS.

The core problem is that buses should be authenticated, authorized, encrypted, and selective-/least-privileged channels. Exposing a memory and expansion bus to the outside world in the name of "convenience" is insane. A trusted set of components in the OS and in hardware should:

0. Be able to do a HwIDS checksumming of all firmwares to detect tampering.

1. Limit devices' ability to connect unless they are authorized by the user, much like a "hardware firewall" UI.. vaguely similar to say VMware Workstation/Fusion's dialog when plugging in a new USB device mixed with something like Little Snitch'es dialog for a process wanting to connect to a particular port.

2. Authenticate devices with public/private key certs that are burned in, a function where the device can answer challenge requests, and Signal protocol-like construction properly modified for PKI. Then, and only then, can a host talk securely to a device over an encrypted channel.

hoistbypetard · on Feb 25, 2020

We never sold anything for it, but for some demos back around 2004/2005 we flashed some firewire iPods with custom firmware that performed the attack described here:

https://web.archive.org/web/20071011191205/http://md.hudora....

(To be clear, those slides aren't mine, but I can no longer find the firmware we based ours off of, and never did get permission to post source to our mods, which were never distributed.)

I know a couple of our customers took entirely the wrong lesson from our demonstrations and banned mp3 players from their office buildings entirely afterwards :)

anonsivalley652 · on Feb 26, 2020

Hahaha. That's awesome. :)

Speaking of things that look silly dept.:

Many moons ago around 1998 in uni, I had an HP 48 graphing calculator which was both programmable and had a very powerful IR LED that was usually used for serial transfers between similar devices. It so happened that some enterprising soul made a customizable, preset and code-learning IR remote control app for it that worked, so I put the brand of the lecture hall's 4 TVs in it. From the back of the room, some 15 meters away, I subtly turned on all of the TVs with a phreak'n calculator. Disbelief, confusion and hilarity ensued.

hoistbypetard · on Feb 26, 2020

TV-B-gone before its time.

That’s great. I hope I’d have done exactly the same.

userbinator · on Feb 26, 2020

Authenticate devices with public/private key certs that are burned in

Hell no! That's how you get the dystopia of vendor lock-in and absolute control.

If I may be controversial: a little insecurity is a good thing. It prevents such authoritarian monopolies from forming.

anonsivalley652 · on Feb 26, 2020

Woah, relax there before you have a stroke. You're over-reacting and jumping to erroneous conclusions while fetishing and trivializing gross insecurity.

Open standards are obvious and essential, and as such a CA for one doesn't have to be owned by any corporation, and are best run by a real non-profit that is independent of governments and bias towards any particular vendor. The hardware manufactures would pay a nominal fee to have a perpetual certificate and millions/billions of people get devices that can't be snooped on from the outside by authoritarian regimes.

userbinator · on Feb 26, 2020

I've seen enough of how such "security" works out in practice to know that such idealism is simply unrealistic.

tezza · on Feb 25, 2020

So an Action Replay for x64

https://gamehacking.org/wiki/Action_Replay_(Amiga)

teddyh · on Feb 25, 2020

Also for PC:

https://www.youtube.com/watch?v=usaioMbE8EQ

jjoonathan · on Feb 25, 2020

How widely deployed are IOMMUs these days? I thought they became a standard thing a few years ago.

drewg123 · on Feb 25, 2020

The problem is that, used properly, IOMMUs are horribly expensive.

Consider a NIC driver where you're mapping an outgoing packet for DMA. What used to essentially be a virtual to physical translation becomes a virt to phys + entering the phys in the iommu + removing the mapping when the transmit is complete. This is expensive for hardware and software reasons. At one point I benchmarked a 100g setup on linux, and with the IOMMU enabled, we lost about 90% of the bandwidth and most of the CPU time was spent in lock contention over the red-black tree that managed the IOMMU tables. This was 5-ish years ago, so perhaps things have gotten better.

So that makes people want to just enable the IOMMU for SR-IOV (and full device) pass-thru to VMs. This is cheaper, since you just set the mapping up when you allocate phys mem for the guest, and tear them down when freeing phys mem.

MacOS used to use a really cool trick where they pre-mapped all mbufs into the IOMMU. That made network traffic transmit and receive comparatively fast. However,it also prevented lots of optimizations that modern operating systems use for zero-copy IO (like attaching pages from sendfile directly to mbufs, similer to skb_frags).

emmericp · on Feb 25, 2020

A problem on the hardware side is that Intel's IOMMU TLB is tiny (64 entries), so using huge pages for all DMA-accessible memory is absolutely required to get a good performance out of it.

We've done some benchmarks here: https://www.net.in.tum.de/fileadmin/bibtex/publications/pape... (Figure 9 on page 10)

Only a very basic benchmark, working on more...

drewg123 · on Feb 25, 2020

Nice paper, and thanks for the reminder of how small the IOMMU tlb is. We never hit this because we were testing full-sized packets (and really bigger, because of TSO) and hit host IOMMU management overheads at ~100K to 200k TSO sends/sec.

emmericp · on Feb 25, 2020

Interesting, did you use huge pages?

I think ~100k to 200k TSO "packets" per second should be doable with the IOMMU. But I guess it depends where the data is coming from. Could be one of the odd cases where copying data is faster than doing zero-copy, e.g., just copy everything into the same small set of small-ish buffers to keep the number of pages that need to be present in the IOMMU small?

hedora · on Feb 26, 2020

This sounds like a driver bug / misconfiguration. Once the IOMMU is set up, there isn’t any host side management to be done (other than managing the IOMMU TLB, but unless it’s thrashing, that’s a no-op on the fast path).

At most, each kernel driver has to do an extra addition to map its physical I/O offset to the one exposed to the bus by the IOMMU. With huge pages, there’s approximately one offset per driver, so it lives in cache, probably next to other driver state.

ajross · on Feb 25, 2020

FWIW: 64 entries isn't particularly small for a dTLB, IIRC that's exactly the size on current Intel cores. The real problem is that device DMA, unlike software behavior, is distressingly non-local. The device will stream out a packet or storage block and then never touch that memory again (or not for a very long time -- memory buffers are huge relative to bandwidth on these things). The TLB just doesn't do you much good.

emmericp · on Feb 25, 2020

There's only one level of TLBs in the IOMMU. And that's 64 entries.

Yeah, I think the dTLB is only 64 entries on Intel CPUs as well, but there's a second larger layer behind that, and an even larger third layer. IIRC it's a total of 4096 entries on recent Intel CPUs.

the8472 · on Feb 25, 2020

Could p2p-dma for zero-copy (e.g. shoveling data from nvme to the nic) avoid a large fraction of the iommu overhead?

waddlesplash · on Feb 25, 2020

Ideally, one would only need to do this level of IOMMU for externally attached devices (i.e. Thunderbolt.) However, the problem is that the vast majority of kernel drivers in most OSes were not written with malicious devices in mind, so they often do things like e.g. putting kernel address space pointers in device-writeable areas.

For instance, XHCI (USB3) has an "address" field in its buffer descriptor structures that often it allows drivers to put anything into, and it will copy that into the "transfer finished" message. Some USB3 drivers just put the kernel address space pointer of the bookkeeping record in there; or worse, allocate the bookkeeping records along with the physical buffers themselves, so a malicious device could manipulate those if it knew where to look.

peterburkimsher · on Feb 25, 2020

Does the attack in its current form work over Thunderbolt? I'd really like to mount my RAM as a drive, and see the files that various programs have open (the waveform from Rogue Amoeba's Fission, among others). Using gore just gave me a 2GB raw file, and it's hard for me to spot files (JPG, PNG) in there.

spacenick88 · on Feb 25, 2020

I think the most expensive isn't the address translation itself but TLB housekeeping when mappings get remapped or invalidated. This is especially true with virtualization where the hypervisor often needs to do extra work like (un)pinning guest pages, translating guest real addresses to host real and reissuing TLB flushes.

the8472 · on Feb 25, 2020

> At one point I benchmarked a 100g setup on linux

Datacenters have controlled physical access. I think IOMMU is far more important for anything with exposed thunderbolt ports (including the upcoming USB4). So Laptops, smartphones, workstations can still benefit from it even if it's currently not viable for cloud server-class workloads.

wtallis · on Feb 25, 2020

Intel used IOMMU support in their CPUs for product segmentation until Thunderbolt came around. It's now pretty widely supported in hardware, but seldom enabled by default at both the motherboard firmware level and OS level.

acdha · on Feb 25, 2020

They did - for example, any supported macOS or iOS device hasn’t been vulnerable to this for years. You can disable that on macOS but not iOS.

peter_d_sherman · on Feb 25, 2020

Related: https://github.com/ufrisk/pcileech-fpga

heeen2 · on Feb 25, 2020

Weren't these used in a famous counterstrike cheating scandal in Norway?

landr0id · on Feb 25, 2020

Yep, they're also pretty popular with people who cheat in ESEA/FaceIT third-party ladders. One of the guys, ra1f, eventually outed himself [1]. The software-based anticheat is fairly decent, pushing people to hardware-based cheats built off of PCILeech to avoid detection.

[1]: https://twitter.com/rra1f/status/1067518342595006466

VectorLock · on Feb 25, 2020

I was just thinking "would this be an effective way to create undetectable cheats for video games?"

hoistbypetard · on Feb 25, 2020

Does DMA over PCIe work using USB gadget mode with a Linux device? i.e. could a Pi be used easily and inexpensively to build an acquisition device for this?

Edit: Bleh. Nevermind. I saw this photo:

https://gist.githubusercontent.com/ufrisk/c5ba7b360335a13bba...

with a pcie adapter connected over what looked like USB3 and forgot that it's thunderbolt on the macbook. I was not quite to the middle of my first cup of coffee when I asked that.

q3k · on Feb 25, 2020

Well USB is not PCIe and doesn't do DMA, so no.

hoistbypetard · on Feb 25, 2020

Aren't most of the devices on the linked page doing PCIe over USB3?

Edit: Bleh. Nevermind. I saw this photo:

https://gist.githubusercontent.com/ufrisk/c5ba7b360335a13bba...

with a pcie adapter connected over what looked like USB3 and forgot that it's thunderbolt on the macbook. I was not quite to the middle of my first cup of coffee when I asked that.

saagarjha · on Feb 25, 2020

USB 4!

_aleph2c_ · on Feb 25, 2020

An interview with Ulf Fritz:

https://www.youtube.com/watch?v=MIfY8g73xms&feature=emb_rel_...

DrRobinson · on Feb 25, 2020

Another interview with him: https://www.youtube.com/watch?v=W5Yb3q9iJao

MaupitiBlue · on Feb 25, 2020

Doesn’t this change everything wrt cheating?

tuvan · on Feb 25, 2020

These devices are apparently already used for cheating. Here are some blog posts from Riot Games and ESEA mentioning direct memory access devices as cheating vectors and preventative methods if you are interested. https://blog.esea.net/esea-hardware-cheats/ https://euw.leagueoflegends.com/en-gb/news/dev/dev-null-anti...

Namidairo · on Feb 26, 2020

Not really.

A few of the better anti-cheat products have had detection vectors for these for a while now, from simple things like detecting the driver to outright probing the device.

Polylactic_acid · on Feb 26, 2020

Surely you could combine a cheating chip with a legitimate device like a network card so to the OS it looks just like a network card.

inetknght · on Feb 25, 2020

I would imagine it's no more "change everything" than having root access to your own hardware.