Excuse the prolix comment; I'm not feeling well today.
I think people who follow both me and Colin on HN know that I have a lot of respect both for him and for Tarsnap, the service he runs, which is the only encrypted backup service I have ever recommended to anyone and which is to this day my go-to recommendation for people looking to safely store data in the cloud. Colin has built one of the very few modern cryptosystems I actually trust.
First, let me dodge Colin's whole post. My Twitter post was:
If you’re not learning crypto by coding attacks, you might not actually be learning crypto.
(I was cheerleading people doing our crypto challenges [http://www.matasano.com/articles/crypto-challenges/] and didn't think much of my twerp; I didn't exactly try to nail it to the door of the All Saints Church).
Note the word "might". I specifically chose the word "might" thinking "COLIN PERCIVAL MIGHT READ THIS". Colin, "might" means "unless you're Colin".
Anyways: I think the point Colin is making is valid, but is much more subtle than he thinks it is.
Here's what's challenging about understanding Colin's point: in the real world, there are two different kinds of practical cryptography: cryptographic design and software design. Colin happens to work on both levels. But most people work on one or the other.
In the world of cryptographic design, Colin's point about attacks being irrelevant to understanding modern crypto is clearly valid. Modern cryptosystems were designed not just to account for prior attacks but, as much as possible, to moot them entirely. A modern 2010's-era cryptosystem might for instance be designed to minimize dependencies on randomness, to assume the whole system is encrypt-then-MAC integrity checked, to provide forward secrecy, to avoid leaking innocuous-seeming details like lengths, &c.
While I think it's helpful to understand the attacks on 1990's-era crypto so you can grok what motivates the features of a 2010's-era ("Crypto 3.0") system, Colin is right to point out that no well-designed modern system is going to vulnerable to a (say) padding oracle, or an RSA padding attack (modern cryptosystems avoid RSA anyways), or a hash length extension.
In this sense, learning how to implement a padding oracle attack (which depends both on a side channel leak of error information and on the failure to appropriately authenticate ciphertext, which would never happen in a competent modern design) is a little like learning how to fix a stuck carburator with a pencil shaft.
The deceptive subtlety of Colin's point comes when you see how cryptography is implemented in the real world. In reality, very few people have Colin's qualifications. I don't simply mean that they're unlike Colin in not being able to design their own crypto constructions (although they can't, and Colin can). I mean that they don't have access to the modern algorithms and constructions Colin is working with; in fact, they don't even have intellectual access to those things.
Instead, modern cryptographic software developers work from a grab-bag of '80s-'90s-era primitives. A new cryptosystem implemented in 2013 is, sorry to say, more likely to use ECB mode AES than it is to use an authenticated encryption construction. Most new crypto software doesn't even attempt to authenticate ciphertext; cryptographic software developers share a pervasive misapprehension that encryption provides a form of authentication (because tampering with the ciphertext irretrievably garbles the output).
I think it's telling that Colin breaks this out into '90s-crypto and 2010's-crypto. For instance:
This is an AES CTR nonce reuse bug in Colin's software from 2011. Colin knew about this class of bug long before he wrote Tarsnap, but, like all bugs, it took time for him to become aware of it. Perhaps he'd have learned about it sooner had more people learned how cryptography actually works, by coding attacks, rather than reading books and coding crypto tools; after all, Colin circulates the code to Tarsnap so people can find exactly these kinds of bugs. Unfortunately, the population of people who can spot bugs like this in 2010's-era crypto code is very limited, because, again, people don't learn how to implement attacks.
But I'll push my argument further, on two fronts.
First: Colin should account for the fact that there's a significant set of practical attacks that his approach to cryptography doesn't address: side channels. All the proofs in the world don't help you if the branch target buffer on the CPU you share with 10 other anonymous EC2 users is effectively recording traces of your key information.
Second: Colin should account for the new frontiers in implementation attacks. It's easy for Colin to rely on the resilience of "modern" 2010's-era crypto when all he has to consider is AES-CTR, a random number generator, and SHA3. But what about signature systems and public key? Is Colin so sure that the proofs he has available to him account for all the mistakes he could make with elliptic curve? Because 10 years from now, that's what everyone's going to be using to key AES.
So, I disagree with Colin. I think it's easy for him to suggest that attacks aren't worth knowing because (a) he happens to know them all already and (b) he happens to be close enough to the literature to know which constructions have the best theoretical safety margin and (c) he has the luxury of building his own systems from scratch that deliberately minimize his exposure to new crypto attacks, which isn't true of (for instance) anyone using ECC.
But more importantly, I think most people who "learn crypto" aren't Colin. To them, "learning crypto" means understanding what the acronyms mean well enough to get a Java application working that produces ciphertext that looks random and decrypts to plaintext that they can read. Those people, the people designing systems based on what they read in _Applied Cryptography_, badly need to understand crypto attacks before they put code based on their own crypto decisions into production.
Excuse the prolix reply; I have a flight to catch.
As I wrote in my blog post, I have a lot of respect for Thomas. He's who I usually point people at when they want their code audited. I really hate reading other people's code and I trust Thomas (well, Matasano) will do a good job.
two different kinds of practical cryptography: cryptographic design and software design
Agreed.
Colin happens to work on both levels. But most people work on one or the other.
I'm generally writing for an audience of people who already know how to write software, but want to know something about crypto. So I take one as given and focus on the other.
modern cryptographic software developers work from a grab-bag of '80s-'90s-era primitives
Right, and that's exactly what I'm trying to change through blog posts and conference talks. We know how to do crypto properly now!
This is an AES CTR nonce reuse bug in Colin's software from 2011. Colin knew about this class of bug long before he wrote Tarsnap, but, like all bugs, it took time for him to become aware of it.
To be fair, that was not a crypto bug in the sense of "got the crypto wrong" -- you can see that in earlier versions of the code I had it right. It was a dumb software bug introduced by refactoring, with catastrophic consequences -- but not inherently different from accidentally zeroing a password buffer before being finished with it, or failing to check for errors when reading entropy from /dev/random. Any software developer could have compared the two relevant versions of the Tarsnap code and said "hey, this refactoring changed behaviour", and any software developer could have looked at the vulnerable version and said "hey, this variable doesn't vary", without needing to know anything about cryptography -- and certainly without knowing how to implement attacks.
Unfortunately, the population of people who can spot bugs like this in 2010's-era crypto code is very limited, because, again, people don't learn how to implement attacks.
Taking my personal bug out of the picture and talking about nonce-reuse bugs generally: You still don't need to learn how to implement attacks to catch them. What you need is to know the theory -- CTR mode provides privacy assuming a strong block cipher is used and nonces are unique -- and then verify that the preconditions are satisfied.
To be fair, that was not a crypto bug in the sense of "got the crypto wrong" -- you can see that in earlier versions of the code I had it right. It was a dumb software bug introduced by refactoring, with catastrophic consequences
Isn't that the entire basis of 'tptacek's argument, though? That even you, as an expert in both software development and cryptography, accidentally got something wrong? An engineering fault occurred, to an expert practitioner. This seems to suggest this sort of thing is not just a function of pure science.
EDIT: On a more serious note, isn't crypto both science and engineering? We have the theoretical aspects, etc... Then we have the practical aspects of implementing these systems in production within an ecosystem that is constantly fighting entropy. I declare a draw.
You still don't need to learn how to implement attacks to catch them.
Implementing attacks is a good way to internalize the idea that "Oh shit, this isn't just a theoretical attack, I better be super careful when doing X, Y, and Z."
There is definitely a kind of crypto attack that isn't very practical to know. For instance, you don't need to understand differential cryptanalysis unless you plan to implement your own block cipher algorithm, which you should never do anyways.
> Instead, modern cryptographic software developers work from a grab-bag of '80s-'90s-era primitives. A new cryptosystem implemented in 2013 is, sorry to say, more likely to use ECB mode AES than it is to use an authenticated encryption construction.
I think Fravia said something similar. He was talking about copy-protection dongles. He respected the cryptography provided by some of the hardware manufacturers, but was dismissive of the way software vendors implemented that crypto in a broken way.
> But more importantly, I think most people who "learn crypto" aren't Colin. To them, "learning crypto" means understanding what the acronyms mean well enough to get a Java application working that produces ciphertext that looks random and decrypts to plaintext that they can read. Those people, the people designing systems based on what they read in _Applied Cryptography_, badly need to understand crypto attacks before they put code based on their own crypto decisions into production.
Oh god yes.
These people need to understand that when someone says "This is broken for a whole slew of reasons. No, I'm not going to code a proof of concept crack." it probably means that the crypto is very broken, and should not be pushed out to production, and certainly should not be promoted as safe and unbreakable and suitable for use by political dissidents in oppressive regimes.
It doesn't mean "We know we can break it faster than we can brute force it, even if there's no practical attack yet".
"Colin should account for the fact that there's a significant set of practical attacks that his approach to cryptography doesn't address: side channels. All the proofs in the world don't help you if the branch target buffer on the CPU you share with 10 other anonymous EC2 users is effectively recording traces of your key information."
Well, it is not practical, but you can prove that an algorithm has no side channels e.g. you can use the construction of Pippenger and Fischer to create an oblivious version of any algorithm. To put it another way, if you could not prove that there were no side channels in an algorithm, you could never prove the security of something like FHE. Even if we assumed a perfect world where implementations were never wrong, practical concerns would still be a drag on the value of security proofs. We do not use AES because we can prove it is secure; we use it because it is fast and "secure enough."
"Those people, the people designing systems based on what they read in _Applied Cryptography_, badly need to understand crypto attacks before they put code based on their own crypto decisions into production."
I am not sure understanding the particular attacks we know so far is really important here. More than anything, I think people need to understand that attacks in general occur where abstractions fail. The closer you stick to the abstraction assumed in a security proof, the more secure your system will be (ignoring implementation bugs). If ever there was a place where premature optimization is a bad idea, it is in the implementation of cryptosystems.
I don't understand your first point, but that could just be the flu talking.
The second point though, I think the opposite is true. You need to understand in your gut that parameters to number-theoretic crypto can be proposed specifically to make your math fail; you need to understand that even if flipping a single bit in your ciphertext garbles the output, that attackers can do useful things with that property; you need to understand that being able to coerce a system into producing the same ciphertext block for the same plaintext block admits terrible attacks; you need to understand where systems "want" randomness versus where they absolutely require it.
It's not enough to know that errors "happen". You have to be able to predict them.
> after all, Colin circulates the code to Tarsnap so people can find exactly these kinds of bugs. Unfortunately, the population of people who can spot bugs like this in 2010's-era crypto code is very limited, because, again, people don't learn how to implement attacks.
I disagree, I think there is so much code, including so much crypto code out there in open source projects that you either have to wait for someone explicitely reviewing this code or someone particularly interessed in this project to find it.
Otherwise on your other points, I don't disagree with your argument but I think it's more important in that order to 1- know the mathematical aspects behind what you implement 2- know how to implement crypto (by having studied different open source projects) 3- know all the main attacks at code level. Ideally one should have a good knowledge on these 3 points before feeling confident in his code.
Learning how to implement crypto by studying open source projects is a recipe for getting people's sites plastered across Pastebin.
It is exactly this 1-2-3 approach to learning that I was thinking about when I wrote the fateful tweet. How do you evaluate whether software is making proper choices? Why do you assume popular open source packages are secure? They often aren't; in fact, they're broken in meaningful ways more often than not.
It's the engineering equivalent of a game of telephone; you copy the errors of the systems you crib from, which are multifarious, and at the same time introduce new ones because human nature has you working hard only so long as there's a payoff, and 99.999% of the payoff in this approach happens once your system round-trips properly; you miss all the subtleties that happen after round-tripping works.
Yeah that's always your mantra, I'm glad for you that you inherited your skills from god and think nobody else is able to dicern good code from bad code, well implemented crypto, from badly implemented crypto. But I dare to say that for instance it doesn't require to be a genius to know the quality of DJB's code and only by studying how it is implemented I think you'll agree with me that it's possible to learn a lot. In this day and age there are several open source projects with a good code quality I think. And I don't assume anything on open source projects I only say that knowing who wrote them I know I can expect an overall good quality, but even in this case you still can keep a critical eye. For instance I don't know much on pairings, so the first thing I would do after reading the theory, I would try to find a decent library implementing it, just to learn a bit more, it does not engage to anything.
Let me put it this way: the approach you've outlined is neither Colin's nor mine. If you want to learn by writing proofs for every single aspect of your system, go ahead.
Neither Colin nor I were suggesting that you could hope to learn how to build secure cryptography by cribbing code from open source projects. Colin isn't just saying "understand the math"; he's saying, "build provable systems, then prove them".
the people designing systems based on what they read in _Applied Cryptography_
I realize this was a side remark in your post, but should I understand this as that in your opinion (maybe the consensus, even?), Applied Cryptography is outdated? Or just that when somebody needs AC to implement their crypto, they don't understand crypto enough to do it well? Or something else entirely?
(Asking because although I don't use crypto much, I do still use AC to get a handle on the high-level concepts; it was _the_ recommended book when I bought it in the late 1990's)
The general critique against Applied Cryptography seems to be that it spends a lot of time high level meta-discussion and too little (or no) time on all the tiny little details that are so important to get right when designing crypto systems, and by doing so it leads to false sense of understanding. Basically it's a good book for non-practitioners to get up to speed on terminology and basic concepts, but not a good book for people who wish to actually design good crypto systems.
In the late 90's the book that I had recommended to be my academics in the field was Handbook of Applied Cryptography. It's a lot more academic and mathematical and not as mass market friendly is Applied Cryptography, but it is also a lot more accurate for people wanting a fundamental mathematical and theoretical grounding in what's going on.
>Most new crypto software doesn't even attempt to authenticate ciphertext; cryptographic software developers share a pervasive misapprehension that encryption provides a form of authentication (because tampering with the ciphertext irretrievably garbles the output).
Can you elaborate on this? What is the most common scenario where somebody gets this wrong?
I think like using ECB, CBC, OFB, CFB, CTR, and XTS modes when encrypting with AES. Theses block cipher modes ensure confidentiality but they do not protect against malicious tampering. Someone can delete part of the encrypted message and it would still decrypt ok, giving the false impression that it's the original message. If I send you an encrypted message, "sell all IBM shares when price hits 120," and it's truncated to "sell all IBM shares," that would be disastrous.
Adding HMAC/CMAC/GMAC authentication code helps to mitigate tampering.
Newer block cipher modes like CCM, GCM, OCB, and others roll both confidentiality and authentication into one, making it much easier to use AES correctly.
I am not sure about different keys. The AES key usage is always the same for encrypting individual blocks. AES is nothing but about encrypting a single 16-byte block of data. The cipher block modes build on top of AES to span encryption over multiple blocks. Some cipher modes can take in additional parameters to customize its usage. E.g. CCM can take in 64-bit to 128-bit authentication codes.
I think people who follow both me and Colin on HN know that I have a lot of respect both for him and for Tarsnap, the service he runs, which is the only encrypted backup service I have ever recommended to anyone and which is to this day my go-to recommendation for people looking to safely store data in the cloud. Colin has built one of the very few modern cryptosystems I actually trust.
First, let me dodge Colin's whole post. My Twitter post was:
If you’re not learning crypto by coding attacks, you might not actually be learning crypto.
(I was cheerleading people doing our crypto challenges [http://www.matasano.com/articles/crypto-challenges/] and didn't think much of my twerp; I didn't exactly try to nail it to the door of the All Saints Church).
Note the word "might". I specifically chose the word "might" thinking "COLIN PERCIVAL MIGHT READ THIS". Colin, "might" means "unless you're Colin".
Anyways: I think the point Colin is making is valid, but is much more subtle than he thinks it is.
Here's what's challenging about understanding Colin's point: in the real world, there are two different kinds of practical cryptography: cryptographic design and software design. Colin happens to work on both levels. But most people work on one or the other.
In the world of cryptographic design, Colin's point about attacks being irrelevant to understanding modern crypto is clearly valid. Modern cryptosystems were designed not just to account for prior attacks but, as much as possible, to moot them entirely. A modern 2010's-era cryptosystem might for instance be designed to minimize dependencies on randomness, to assume the whole system is encrypt-then-MAC integrity checked, to provide forward secrecy, to avoid leaking innocuous-seeming details like lengths, &c.
While I think it's helpful to understand the attacks on 1990's-era crypto so you can grok what motivates the features of a 2010's-era ("Crypto 3.0") system, Colin is right to point out that no well-designed modern system is going to vulnerable to a (say) padding oracle, or an RSA padding attack (modern cryptosystems avoid RSA anyways), or a hash length extension.
In this sense, learning how to implement a padding oracle attack (which depends both on a side channel leak of error information and on the failure to appropriately authenticate ciphertext, which would never happen in a competent modern design) is a little like learning how to fix a stuck carburator with a pencil shaft.
The deceptive subtlety of Colin's point comes when you see how cryptography is implemented in the real world. In reality, very few people have Colin's qualifications. I don't simply mean that they're unlike Colin in not being able to design their own crypto constructions (although they can't, and Colin can). I mean that they don't have access to the modern algorithms and constructions Colin is working with; in fact, they don't even have intellectual access to those things.
Instead, modern cryptographic software developers work from a grab-bag of '80s-'90s-era primitives. A new cryptosystem implemented in 2013 is, sorry to say, more likely to use ECB mode AES than it is to use an authenticated encryption construction. Most new crypto software doesn't even attempt to authenticate ciphertext; cryptographic software developers share a pervasive misapprehension that encryption provides a form of authentication (because tampering with the ciphertext irretrievably garbles the output).
I think it's telling that Colin breaks this out into '90s-crypto and 2010's-crypto. For instance:
http://www.daemonology.net/blog/2011-01-18-tarsnap-critical-...
This is an AES CTR nonce reuse bug in Colin's software from 2011. Colin knew about this class of bug long before he wrote Tarsnap, but, like all bugs, it took time for him to become aware of it. Perhaps he'd have learned about it sooner had more people learned how cryptography actually works, by coding attacks, rather than reading books and coding crypto tools; after all, Colin circulates the code to Tarsnap so people can find exactly these kinds of bugs. Unfortunately, the population of people who can spot bugs like this in 2010's-era crypto code is very limited, because, again, people don't learn how to implement attacks.
But I'll push my argument further, on two fronts.
First: Colin should account for the fact that there's a significant set of practical attacks that his approach to cryptography doesn't address: side channels. All the proofs in the world don't help you if the branch target buffer on the CPU you share with 10 other anonymous EC2 users is effectively recording traces of your key information.
Second: Colin should account for the new frontiers in implementation attacks. It's easy for Colin to rely on the resilience of "modern" 2010's-era crypto when all he has to consider is AES-CTR, a random number generator, and SHA3. But what about signature systems and public key? Is Colin so sure that the proofs he has available to him account for all the mistakes he could make with elliptic curve? Because 10 years from now, that's what everyone's going to be using to key AES.
So, I disagree with Colin. I think it's easy for him to suggest that attacks aren't worth knowing because (a) he happens to know them all already and (b) he happens to be close enough to the literature to know which constructions have the best theoretical safety margin and (c) he has the luxury of building his own systems from scratch that deliberately minimize his exposure to new crypto attacks, which isn't true of (for instance) anyone using ECC.
But more importantly, I think most people who "learn crypto" aren't Colin. To them, "learning crypto" means understanding what the acronyms mean well enough to get a Java application working that produces ciphertext that looks random and decrypts to plaintext that they can read. Those people, the people designing systems based on what they read in _Applied Cryptography_, badly need to understand crypto attacks before they put code based on their own crypto decisions into production.