Network cards which support RMCP/IPMI protocol are obvious points of attack. They can reboot machines, download boot images, install a new OS, patch memory, emulate a local console, and control the entire machine. CERT has some warnings:
The network card stores passwords in non-volatile memory. If anyone in the supply chain gets hold of the network card briefly, they can add a backdoor by plugging the card into a chassis for power, connecting a network cable, and adding a extra user/password of their own using Linux "ipmitool" running on another machine. The card, when delivered to the end user, now has a backdoor installed. If you have any servers you're responsible for, try connecting with IPMI and do a "list" command to see what users are configured. If you find any you didn't put there, big problem.
CERT warns that, if you use the same userid/password for multiple machines in your data center, discarded boards contain that password. So discarded boards must be shredded.
Absolutely. NICs in general are a very fruitful vector for persistence, and had been extensively studied by the NSA.
Generally, anything with a microcontroller that might run firmware (BIOS or UEFI), access DMA (via PCI, PCIe, FireWire) or be a storage peripheral that might pass code to the boot process (HDD/SSD/CD/DVD/BD/Flash drive/memory card firmware, including USB) or input (USB) is a potential problem.
That is a pretty damn big attack surface, and civilian researchers are able to do this too (the only big advantages Nation State Adversaries really have is funding and occasionally vendor cooperation, although I'd expect that to be rare in this case for operational security reasons - they might get datasheets under false pretenses, however, but so could we, we just wouldn't get away with it if caught <g>).
The TPM arch isn't so much of a problem here as trying to be a solution, but it falls short and has down sides too.
Supply chain integrity is a huge, possibly unsolvable problem. I'd be interested to see however some solutions to massively complicate any such attack, like an open trusted processor which boots ROM externally readable in hardware with no override and keeps secure hash chains of the firmware that loads - again, which would be externally verifiable with no way to override in firmware. That would put a crimp in their day.
Since those who say "domestic" usually mean USA, I'm guessing you live there.
Bad news: your government is one of the attackers. (So is mine, unfortunately.) I take it you've seen the NSA interdiction guys taking discreet hacksaws to Cisco parcels en route to 'implant' (backdoor) them by now?
Did you think that was something that only happens abroad?
It's not something that only happens abroad, but to my (admittedly limited) knowledge the US is a less severe problem for a US company.
That is, they snoop on you, yes. But they don't sell you shoddy knock-offs instead of real parts, they don't give intel on you to your competitors, and they don't actually attack you Stuxnet-style. (that I know of)
I could be way off base, but as a US company I'd much prefer a US gov't backdoor to a Chinese gov't backdoor, and supply chains contaminated by knock-offs is a nightmare unto itself.
P.S. Yes, I'm in the US, and yes, you're right about "domestic"
P.P.S. IBM's fab (to my knowledge) mostly exists to serve the US gov't anyway. At least for the NSA themselves, the advantages to domestic production are there :)
While the main point of the article is interesting, some of the details don't really make sense.
For example, it would be difficult to make an instruction like fyl2x or fadd cause a privilege level change. The reason is that floating point instructions are executed on a separate unit (the FPU), with a separate decoder. This unit would not have the means to communicate back information such as "change privilege level" (normally it can only signal floating point exceptions, and other than that its only output is on the floating point registers). It would make more sense to encode the backdoor on an illegal opcode, i.e. an opcode that under normal conditions would generate a UD# exception, but with the correct values in the registers it would trigger some undocumented behavior.
Another question is how to hide this backdoor in the microcode. Presumably, at some point someone might stumble upon the backdoor and ask around about it. If the backdoor depends on some "magic values", it would be relatively easy to spot just by looking at the microcode.
There's also the point that the author mentioned of "fixing" the processor at some point during the production process. I don't think that the author understands the way mass production of microchips works. It's very much not possible to do something like this while keeping the production price on the same level (or someone noticing this extra step in the production process).
All in all, it sounds much easier to find security bugs in other parts of the system.
The reason is that floating point instructions are executed on a separate unit (the FPU), with a separate decoder.
I don't think that has been true for a very long time.
If the backdoor depends on some "magic values", it would be relatively easy to spot just by looking at the microcode.
The problem with both your theory and the article's theory is that nobody outside the chip companies themselves really knows how the microcode works. This reduces both the people who could pull off such a backdoor and the people who could discover it to a very small number.
A similar thing applies to your point about changes during manufacturing.
Overall, this means that CPU backdoors are a thing to be concerned about, keeping in mind that it's probably a technique that will, for a long time, be limited to the kind of people who were responsible for Stuxnet.
There are so many people involved in the design and manufacturing of a processor, that I don't see how it's possible to hide a backdoor, either in the microcode or during manufacturing. We're not talking about some secret government agency, we're taking about a place with many workers around the world, with different agendas. Eventually someone will find about about the backdoor and leak information about its existence.
> There are so many people involved in the design and manufacturing of a processor, that I don't see how it's possible to hide a backdoor, either in the microcode or during manufacturing. We're not talking about some secret government agency, we're taking about a place with many workers around the world, with different agendas.
However, the end result of both CPU design and the microcode team is essentially unreadable.
No-one outside Intel can read their microcode updates, as they are obfuscated in some way, possibly encrypted. This means that compromising just the last step, the people or tools doing the obfuscating, means you can output whatever you want with no-one on the team being able to find it out.
The same is true for the CPU design. Created masks are generally not looked at, other than to verify small spots if it seems there are bugs. Because of this, compromising the last step between the model and the mask would allow you to output whatever you want with no-one of the thousands of the people working on it ever finding out.
It's not that they're decoded by a seperate unit but that they're executed by it. Still, the values of the two operands aren't going to be available anywhere except that FPU. If the entire design team was in on it you could probably run a signal back from the FPU to the decoder to issue an instruction to do privilege escalation. As for the idea of looking at the value of multiple registers, well, modern OoO CPUs don't have registers in the sense that the author imagines they do and having a floating point uOp look at more than two operands would have humungous implications for the architecture. Like, Intel no longer being able to beat AMD implications.
But if you imagine emulating x86 in software everything the author writes makes perfect sense. It would be just a tiny bit of code to do all the checks the author mentions and it would be really easy to hide it. So I'm guessing the author is a software person without too much detailed familiarity with how modern CPUs work under the hood. And that's ok, since exploits like this are still possible, it's just that reality is really complicated and this is just an example of why it can be so hard to make up stuff convincingly[1].
If I were trying to do something like this I'd do it in the decoder. x86 has a lot of weird prefixes you can throw together and large constants can be included in the instruction stream. By using in invalid value as a floating point constant I can decrease the odds of someone accidentally discovering the problem. The extra fanout in the decoder will mean a bit more power and latency but x86 decode is already super complicated so probably nobody will notice. And once you know you've gotten a valid trigger I expect you should be able to issue uOps by the normal path that will do what you want. Only a few people will ever need to know.
> having a floating point uOp look at more than two operands would have humungous implications for the architecture.
Actually, in HSW and later all FP ops can have 3 register inputs, they added one to support FMA.
However, this is irrelevant. The scenario basically everyone has been talking about is exploiting a microcoded instruction. In case you are unfamiliar with the term, when the CPU frontend reads in an instruction it deems microcoded, it will stop decoding normally, and instead reads a sequence of ops from the microcode buffer that corresponds to the microcoded instruction you just executed. These ops can be anything the cpu can run -- it would be completely possible to emit a bunch of floating point compares, instructions to and their results, then take this result to the integer flags and do compare and jump on them. And for the longer microcoded ops like fp transcendentals, it would probably be possible to hide all these ops in the shadow of the normal ops, so that there would be no increase in latency.
Oh, I see. I hadn't recognized that fyl2x was a microcoded instruction. Looking at the page again I totally missed the footnote and he's completely right.
However, for FMA I assume that they break the FMA instruction into two uOps that have to be issued back to back to the same execution unit. The changes to the reorder register renaming logic would be just too painful otherwise, especially since Intel uses a unified scheduler which also handles the integer ops.
EDIT: I suppose that I tend to forget about microcoded instructions since none of the instruction sets I've worked with directly has really had them.
> However, for FMA I assume that they break the FMA instruction into two uOps that have to be issued back to back to the same execution unit. The changes to the reorder register renaming logic would be just too painful otherwise, especially since Intel uses a unified scheduler which also handles the integer ops.
FMAs are not split, they are executed as single instructions issued in a single cycle. There were no changes to renaming or reorder logic from 3-op instructions. Since SNB, Intel has been using PRF, where the rename system is decoupled enough from the scheduler that there is no need for any changes.
> For example, it would be difficult to make an instruction like fyl2x or fadd cause a privilege level change. The reason is that floating point instructions are executed on a separate unit (the FPU), with a separate decoder.
This is not true. On modern Intel CPUs, FP instructions are decoded by the same decoders as all other instructions. As for the rest of the pipeline, the whole point of picking a microcoded instruction is that microcode can emit any ops.
> It would make more sense to encode the backdoor on an illegal opcode, i.e. an opcode that under normal conditions would generate a UD# exception, but with the correct values in the registers it would trigger some undocumented behavior.
This would only be exploitable through native code, and if suspicions arose, it would be detectable and it would be possible to extract the magic instruction from it.
A much better vector is a microcoded slow operation with lots of input bits, and that gets emitted by javascript jits.
> Another question is how to hide this backdoor in the microcode. Presumably, at some point someone might stumble upon the backdoor and ask around about it. If the backdoor depends on some "magic values", it would be relatively easy to spot just by looking at the microcode.
Microcode for modern CPUs is signed and encoded in some manner (encrypted?). So long as the bad guys get between the team that designs the microcode and whatever process that encodes it, no-one will ever be even able to see the evil ucode.
> There's also the point that the author mentioned of "fixing" the processor at some point during the production process. I don't think that the author understands the way mass production of microchips works. It's very much not possible to do something like this while keeping the production price on the same level (or someone noticing this extra step in the production process).
It doesn't take extra steps, it takes small modifications in the masks of a few metal layers. If someone could make their few modifications into the files before the masks are written, this would be damn near impossible to spot.
Actually you can make something that checks for a magic value without encoding the magic value in the microcode. Triggering a buffer overrun in software often requires very particular inputs, but those inputs are not explicitly encoded in the binary. If you were clever, you could make microcode that looks correct, but backdoors under the right conditions.
Modifying the masks struck me as a fairly farfetched scenario. I don't know how many spare gates a processor ships with but you would have to know the design pretty intimately to be able to include something malicious by remapping, both in terms of logic and physical constraints.
Who needs dirty trace-able CPU backdoors when Intel's SGX technology will allow them perfect plausible deniability to give NSA (or China if they force them by law) the key to all "secure apps" that will be using the SGX technology:
> Finally, a problem that is hard to ignore today, in the post-Snowden world, is the ease of backdooring this technology by Intel itself. In fact Intel doesn't need to add anything to their processors – all they need to do is to give away the private signing keys used by SGX for remote attestation. This makes for a perfectly deniable backdoor – nobody could catch Intel on this, even if the processor was analyzed transistor-by-transistor, HDL line-by-line.
Wow, that's like the entire microelectronics class I took in undergrad compressed into an hour, just without any of the math and (obviously) light on theory. Impressive.
I worked for an optical company building complete objective lenses and illuminator optics for steppers using G-line, I-line, and ArF and KrF laser sources. "Solarization" and ablation of the lens coatings was an ongoing problem, especially for UV sources. A stepper objective has 20+ lens elements. Lenses were refurbished by replacing elements as their transmission fell.
Cool article. I didn't understand how the privilege escalation would be exploited. Obviously if the attacker already has access to the box, he can get root with this exploit.
I think a chip backdoor could also be based on information leaking rather than executing arbitrary code.
The steps would be:
1. Identify critical info, like crypto keys, from heuristics. This means keeping a special buffer, since you don't know at the beginning of an RSA operation that it's an RSA operation. The heuristics are not perfect, of course, but work with standard apps like Firefox, GPG and Outlook.
2. Exfiltrate the info. Via spread-spectrum RF, timing jitter in packets, or replacing random numbers in crypto. The article implies that since OSes and apps mix the hardware RNG with other sources, there's no point in subverting it. But the CPU can recognize common mix patterns, like in the Linux kernel, and subvert the final output.
In this case the output entropy is good, but also leaks some secret to a listener who has the right keys.
1. You design your CPU so whenever you execute an add instruction with $r1 = x, $r2 = y (say these are the add inputs), the next add instruction will switch to ring-0 mode and run code at address which is the result of the add.
2. You don't need access to the box. You just get the target to load a site with JS that sets x and y to those specfic values and adds them, and then adds zero to some address you want to execute (which you can aim to be shellcode in a JS string or something, but even if not there are a million tricks you can use to execute arbitrary code if you can run code at an arbitrary address).
3. Assuming the JS engine compiles sanely, you now have a way to control any computer and make it do anything via some JS on any web site. Ring-0 can totally bypass all virtualization and even the OS itself.
And this exploit is so simple and powerful that everything else is a waste of time. No need to use statistical and entropy tricks to leak keys. You can own any computer with JS on a web site and steal anything you want from memory, including keys. And you can probably do this without the target noticing anything.
I'm not clear on the "million tricks" though. Suppose you embed some machine language in a string javascript, and use your r1/r2 trick - how can you compute the address of the machine language you want to execute?
Or are you saying there are existing chunks of machine language in the OS or apps that would be useful to remote execute, with no arguments?
But, given that (almost) everyone runs browsers for which the binaries, at least, are avaliable, it's relatively straightforward to come up with something that triggers the vulnerability.
And even if you don't know what browser they are using, you can make guesses. For example - if you have an image that's 102px wide and immediately to the right of an image that's 57px wide, at some point the CPU will probably add 57 and 102. Things like that.
CPU backdoors are a very real concern, but not only in the CPU but in the growing complexity of the motherboard chipset. For example, a malicious memory controller could manipulate data on the way to the CPU, causing a faithful CPU to do malicious things.
For highly secured systems, this is of growing concern. With the amount of stuff made in China the supply chain is considered a considerable attack surface which has to be considered when sourcing electronics.
I abandoned x86 a few years back, because I'm far less concerned about China knowing my secrets than I am with the Five Eyes countries violating my privacy. The likelihood of the nsa or gchq tampering with an allwinner or freescale chip en route is much lower than with Intel or AMD. And far more resources would be involved than would be financially reasonable to tailor an operation for a small-potatoes corporation running an all-ARM setup like mine.
So I'm seeing it as a decrease in attack surface, overall.
Given the fact that the NSA targets linux users [0], is it really that far fetched that they could be adding backdoors to CPUs ordered by certain NSA targets?
I'm assuming most linux enthusiasts build their own rigs, as do I.
Well, to be realistic, almost all Linux users are of no actual interest to NSA. All it means is that statistically, someone browsing about encryption is more likely to be thinking about committing a crime. If you count people who go the extra mile to do heavy encryption for privacy reasons, and people who do that to hide crimes... That could be interesting.
The Linux Journal is, in some way, an extremist forum - people who are extremely technically advanced. Being extremist doesn't mean that you are a terrorist.
Imagine, you have browsing history for every convict for some specific crime and are tasked to derive a scoring formula. You'd probably see that there's a positive correlation between hasBrowsedLinuxJournel and isConvicted.
Using Linux doesn't put you on the kill list. It just means you share something in your behavior with people who are of interest to national security. There probably are many more factors like that - shopping patterns, movement patterns, etc. It's just that Linux made the headlines and media took chance to generate some hype.
for many modern desktops/laptops (including recent Apple machines, which i don't think was the case even just a few product cycles ago), Intel's vPro appears capable of many forms of surveillance/subversion.
in terms of understanding/mitigating these types of threats, i wish an open, crowdfunded project to reverse engineer the contents of intel's microcode updates existed to the point they were understandable by the tech press.
i also wish an easy-to-use package for blacklisting cpu-based and crypto-related kernel modules (like aes-ni) existed for a broad range of processors..
and of course only somewhat relatedly, i continue to wish the man page for random(4) would be rewritten in light of the risk of these types of backdoors.
to reverse engineer the contents of intel's microcode updates
I don't think that's (sanely) possible. The amount of information needed about the silicon is very high and modern x86 processors are (for now) pretty much impossible to reverse engineer by delayering and taking pictures of the insides of the chip (14nm = ~60 atoms)... Also the costs of people who are able to reverse engineer such stuff would be very very high.
It seems very unlikely that someone would be able to "apply the edit to a partially finished chip". The adding of a fix like this is probably some of the most scrutinized processes in hardware design. After spending years designing and verifying chip functionality and getting the timing exactly right before production starts there is a very high bar for getting these fixes in to the production flow because if the fix screws anything else up you are FUBARed. Given that, it is probably the hardest place you could ever try and put a back door.
http://www.ssi.gouv.fr/IMG/pdf/csw-trustnetworkcard.pdf
Network cards which support RMCP/IPMI protocol are obvious points of attack. They can reboot machines, download boot images, install a new OS, patch memory, emulate a local console, and control the entire machine. CERT has some warnings:
https://www.us-cert.gov/ncas/alerts/TA13-207A
If there's a default password in a network card, that's a backdoor. Here's a list of the default passwords for many common systems:
https://community.rapid7.com/community/metasploit/blog/2013/...
"admin/admin" is popular.
The network card stores passwords in non-volatile memory. If anyone in the supply chain gets hold of the network card briefly, they can add a backdoor by plugging the card into a chassis for power, connecting a network cable, and adding a extra user/password of their own using Linux "ipmitool" running on another machine. The card, when delivered to the end user, now has a backdoor installed. If you have any servers you're responsible for, try connecting with IPMI and do a "list" command to see what users are configured. If you find any you didn't put there, big problem.
CERT warns that, if you use the same userid/password for multiple machines in your data center, discarded boards contain that password. So discarded boards must be shredded.