Hacker News new | past | comments | ask | show | jobs | submit login
Ken Thompson on the bug that exposed his compiler trojan (2021) (tuhs.org)
224 points by throwawaylinux on June 19, 2023 | hide | past | favorite | 33 comments



Required (re)reading whenever trusting trust comes up: https://www.teamten.com/lawrence/writings/coding-machines/


Thanks, that was really good and with amazingly convincing detail of the technical things. Wow! Now: scared. Eh.

The most jarring thing (to me) on a quick perusal was the idea that a programmer would walk to a Peet's coffee in another building (from the sound of it, a bit of a walk at least) and return, and still be drinking from an espresso. I mean those things are tiny, and meant by (The Italian Coffee Overlords) to be drunk more or less like a shot straight at the bar. Whatever.

Obviously, this being a human behavior thing, there are going to be hundreds of people who walk around with espressos and sip on them for hours, it's just ... a bit off, at least to me.


That was a wild read. Thanks for sharing. Lesson learned, but now I have to go "walk it off" haha.


Wait, I'm confused. That's fiction, right?

EDIT: As in, the text clearly looks like fiction on its own, but it's being posted here as a companion piece to a series of events that definitely happened, hence my confusion.



Fiction. As far as the NSA wants you to know, anyways.


I'm confused too.


That article is a gem.

> “This is crazy. We can’t be writing a compiler in assembly. No way.”

> “It’s not that awful,” said Patrick

Dear lord.


So to clarify, they recompiled the compromised compiler using the compromised compiler and it kept growing in size with each iteration?

(See Reflections on Trusting Trust if you are completely confused what this is about.)



I'm still not sure I believe this.

I heard about this long ago, and understood it to be a thought experiment, if an ominous one.

And I was shocked to hear reports that it was "real" and "live" in the C compiler (which one? GNU?)

But I'm not sure if I believe this; could Ken still be pulling our leg at this point?

What is the nature of the malware or backdoor? How is it accessed? What does it do? Does it grant root privileges to an unprivileged user? Does it execute untrusted code? Does it open a socket for a RAT? Does it work under Windows? MacOS? Android?

Is Ken keeping mum about the nature and capabilities of the backdoor so he still has leverage?

If this is true, it's a security concern about the size of Intel's IME, don't you think?


That is so neat! I never did read the paper itself on trusting trust or whatever it was called, so I always thought this was mainly a theoretical kind of thing. Didn't know that the man actually made a real-life proof-of-concept for that exploit!


Reflections on trusting trust

It's only 3 pages and worth the read:

https://dl.acm.org/doi/10.1145/358198.358210

https://dl.acm.org/doi/pdf/10.1145/358198.358210


it is very real indeed. I've heard tale of this kind of thing being used in the wild one time, and there must be use of this kind of exploit in the wild that has not been detected.

read the paper, it will scare you at least a little if you understand what it lays out.

we really do rely on the hope that our compilers are pure, and we have very few tools to detect a bad compiler if our tools are also compiled with a malicious compiler. even if we compile the compiler from source, we can't know, because the compiler itself could be "in on it."


> we really do rely on the hope that our compilers are pure

Jeremiah Orians hacked his way through the whole supply chain up to raw machine code to get a provably clean, up-to-date GCC for Linux on amd64¹, solving the bootstrapping problem in a complete way. He and some Guix people have also then worked to integrate this into GNU Guix (a cross-distro package manager) and GuixSD (a GNU operating system based on that package manager), so it's actually not too hard to make practical use of that work, either!

Imo, this is an incredible achievement that deserves much wider recognition. It must have taken a very principled, curious, obsessive, stubborn personality to even seriously take up this work. Pretty damn cool that it even happened.

--

1: https://savannah.nongnu.org/projects/stage0

2: https://guix.gnu.org/manual/devel/en/html_node/Full_002dSour...


I forgot the guy's name and fucked up by only looking at some of the most recent commits. Another hacker to highlight, and the one whose lectures taught me about these efforts when I found them on YouTube, is Jan Nieuwenhuizen, who goes by janneke online.

He's the author of GNU MES (Maxwell's Equations of Software), the scheme interpreter used in this bootstrap effort, and IIRC he's worked on many parts of this whole thing.

As a bit of an apology as well as a followup, here's some talks he gave a few of years ago about this whole bootstrap story!

janneke's talk from FOSDEM 2017: https://youtu.be/mhopx8J2Z8s

janneke's talk from FOSDEM 2020: https://youtu.be/XvVW80dDF8I

I'm a real fan but I'm only a spectator and my memory sucks. Sorry :(


This is damn impressive!

I am sure most of us have never even thought on these lines! Have to spend some time trying to "grasp" it.

Thanks for posting this.


There is more than just hope to rely on.

Diverse Double-Compiling[0] can provably detect this class of attack.

[0]https://dwheeler.com/trusting-trust/


Under certain assumptions. This method relies on making its assumptions expensive to violate. Which is good enough in practice...

...unless you're dealing with an attacker with vastly more resources than you, and a will to spend it. It's always worth keeping in mind that the way magic tricks work is usually because the performer invested much more time and effort in preparation and practice than anyone in the audience would consider reasonable.


When I learned about it, our professor told us it was an "if I did it..." type of scenario. Very cool to see from the mailing list that it was more than a hypothetical


heard tell


I did read the paper, I always thought it was theoretical too. Then I saw a video with maddog saying he witnessed Ken logging in using his backdoor.


Lesson learned: It can't show up in the symbol table.


Reverse engineering the dump through assembler back into inferred intent in a higher language would have shown some very odd things being done. You would go "ok this is some crt0 initialisation, it must be needed". But some rigour would reveal what it does has nothing to do with runtime initialisation, its just a quine reproducing its nefarious intent inside the code.

Which I think goes to the idea that reflections on trusting trust in the end is a reflection on what trust you have with other people: either not to do this, or to be able to ask the right people to infer this hasn't been done, and have reason to believe them.

If the tools they use are made by the person who did the embedding, your trust is highly conditional. It argues for diversity of tooling.

LLVM should check GCC and vice-versa?


Yeah but what if the disassembler is in cahoots with the compiler and knows not to dump the suspicious part?


I presume pwb here is not a person but the section responsible for releasing a production version of UNIX v6 known as PWB.

https://gunkies.org/wiki/PWB/UNIX


> the extra byte was my bug.

Well, the one that broke it at least. Attacks of this nature are inherently brittle.


Reproducible builds are important!


nixOS


Why would it get bigger every time? Did it keep getting bigger if you used the same compiler to compile the source in a loop?


Because it was a bug. Maybe he always appended a trailing zero byte ('\0') to some string.

In general, if you're not aware of the content of the paper, you might want to familiarize yourself with it. In short & maybe simplified: The pre-compiled compiler contains a malware that, when compiling the compiler source, will be insert itself again. (Of course the source does not contain the malware). They used the pre-compiled compiler ("stolen" from Ken) to compile the compiler's source code; this of course inserted the malware into the newly built compiler. Then they used the newly built compiler to compile the compiler's source code again, and so on. And every time they did that, it grew by one byte because of a bug in the malware component.


The compiler copies a part of itself directly into the new compiler binary (i.e. it is a type of quine)

If a bug causes that copy mechanism to make the "part to be copied" one byte larger than it should be, then each time the compiler compiles itself then that part, and therefor the compiler itself, will be one byte larger than the time before.


It was a bug.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: