This is a great essay on why you should never use a hash function for message authentication.
Except not for the reason the author thinks.
There are several problems here.
First, with SHA-1 for example, you have 64 bytes per chunk. That means you basically get a free ride on this problem for anything < 64 bytes. A lot of "application state" fits pretty well in 64 bytes.
Secondly, unless a message ends right on the 64 byte boundary, it is not nearly that simple. You have a bit of a problem, because the hash is padded, and when you add extra characters to your original string, that padding gets replaced with those values. So, it's no longer simple to just "keep going" from where you stopped.
Still, you can see how that leaves a distinct subset of cases where you'd be exposed. SHA-1, along with most secure hash functions, appends the length of the message the end of the source text before performing the hash function. That means that if you add even one byte to the string, you have now changed the last 8 bytes that were fed in to the "original" hash function. Oh, and your extra byte goes in before those bytes, so not only did you change those 8 bytes, but you shifted them down a byte.
So, no, it isn't nearly that easy to crack a SHA-1 based authentication, and yet, it is easy enough that you should totally NOT use them for authentication and instead use HMAC ; they are vulnerable to extension attacks, it's just not nearly as easy as this article suggests, and conclusions one might draw from this article (like you can solve this problem by feeding all source text in to the hash algorithm backwards) are likely ill founded.
It just turns out that cryptography is way more complicated, and even in terms of understanding weaknesses that arise from doing things wrong, you are going to get it wrong. Trust the experts, when they say it is a bad idea, but don't assume why it is a bad idea can be explained in a short blog article like this.
UPDATED: Added an explanation as to why it might be dangerous to just take this article at its word.
In practice, attackers just guess the bit length of the message by writing the Ruby script that generates the 100 candidate messages at different expected bit lengths and feeding them to their fuzzer. The bit length issue is not a real one in practice.
In practice, length-extendable SHA-1 MAC functions are mechanically exploitable; they'll be discovered quickly by competent attackers who don't even have access to your source code, because attempts to break them require very little code and even less time.
It is a very, very bad idea to put off fixing a problem like this because you've been led to believe by someone on a message board that MD padding is going to be an obstacle to attackers. It isn't. The Flickr MAC break required MD padding, too; Thai and Juliano broke it just the same.
There seems to be some kind of impedance. I'm not suggesting you can avoid using an HMAC and get away with just a hash.
Let's be clear about what an HMAC is. It's pretty much what this article describes, but executed with a specific protocol that avoids exploitable weaknesses in naive approaches.
It's that similar. The biggest non-security consequence is that you have to call your hash function twice. If that is a problem, you have bigger problems. If you are using a secure hash to authenticate a message, you should always use HMAC.
> In practice, attackers just guess the bit length of the message by writing the Ruby script that generates the 100 candidate messages at different expected bit lengths and feeding them to their fuzzer. The bit length issue is not a real one in practice.
You're closer to addressing the real complexity involved in cracking it than the original article. I agree that in principle it isn't hard to break a naive hash-based digest authentication. I'm merely pointing out that it isn't simple.
Also... not sure why they'd have to use Ruby or why that is relevant...
> It is a very, very bad idea to put off fixing a problem like this because you've been led to believe by someone on a message board that MD padding is going to be an obstacle to attackers.
No. It's a very, very bad idea to ever go down this road in the first place. Start with a MAC. If you think you can be more clever than the collective expertise of the field, study cryptography and prove it, but until you've finished studying it, do what they say. History is littered with people who thought they knew just enough about cryptography that they could do things their own way without consequences (WEP anyone?).
In general I'm all for encouraging amateurs to compete with the pros, but cryptography is definitely one of those fields where it can appear deceptively simple to achieve your goals, and it totally, totally, totally is not.
It's fine that you want to tell the thread what HMAC was, or point out that the exploit for this problem isn't literally 100 words of blog post long.
But the reality is that this is an exploit that everyone on our team, and I assume all our real competitors teams, are expected to be able to bust out on demand. It's not hard. It comes up a couple times a year in web apps, we spot it, and we exploit it. It's really not one of the more interesting or involved crypto flaws to exploit.
You wrote:
Still, you can see how that leaves a distinct subset of cases where you'd be exposed. SHA-1, along with most secure hash functions, appends the length of the message the end of the source text before performing the hash function. That means that if you add even one byte to the string, you have now changed the last 8 bytes that were fed in to the "original" hash function. Oh, and your extra byte goes in before those bytes, so not only did you change those 8 bytes, but you shifted them down a byte.
Well, respectfully, no shit. Was it your understanding that the Flickr team screwed their MAC up so badly that you didn't even have to guess glue padding to pull the attack off? You wrote this paragraph without context; you even followed it with a sentence about how you might be safer if you could fit your application state in a SHA1 message block because I forget why that could possibly matter?
Next time, when you want to point out how deceptive the apparent simplicity of a crypto concept is, choose one of the defender's problems. The attacker's problems are mostly deceptive in the opposite direction.
I think I was pretty clear that extension attacks work. I was clear that they work despite features of a hash algorithm that would appear to address the issues highlighted in the article. I was also very clear that it is the defender's problems that are deceptively hard.
In a couple of cases I probably should have said "simple" instead of "easy". "Easy" implies more about effort than complexity. At some points I meant "little effort" and other points I meant "little complexity", but I used the same word for both, so that's bad.
Still, I don't get how you inferred the above from this, which to me reads as "any efforts you might make to address this problem your own way will almost certainly fail miserably":
It's just not nearly as easy as this article suggests, and conclusions one might draw from this article (like you can solve this problem by feeding all source text in to the hash algorithm backwards) are likely ill founded.
"Never use a hash function for message authentication" is such a simplistic view. The author takes a common hash function design (Merkle-Damgard), and somehow extrapolates that hash functions should be simply ruled out for authentication.
First, this may send the wrong message to the less-focused reader: "what, I should use block ciphers instead?". Luckily, HMAC is eventually brought up, which is a fine solution.
HMAC requires 2 calls to H, our favorite hash function. Certain applications may find the overhead to be prohibitively high. With non-broken hash functions (e.g., any of the SHA-3 finalists), we can use the so-called envelope authenticator: A = H(K||M||K), with some padding to separate K from M to keep security proofs happy. This is significantly faster for short messages, and short is the most common size out there.
> Certain applications may find the overhead to be prohibitively high. With non-broken hash functions (e.g., any of the SHA-3 finalists), we can use the so-called envelope authenticator: A = H(K||M||K), with some padding to separate K from M to keep security proofs happy.
I think it is safe to say that you are suggesting a very narrow context. If invoking a hash twice is prohibitive and you are looking at choosing amongst SHA-3 finalists as an alternative (particularly given that many of them are slower than SHA-2, and I believe they are all slower than SHA-1 on low cost CPU's where you might expect this constraint), you are already talking about a very narrow performance window (literally one iteration of Moore's Law covers the gap). You then throw in that people are making refined selection of hash algorithms taking in to account various security nuances, and I think you are dealing not only with a narrow context, but a degree of cryptographic expertise that you don't have to worry that anyone solving it would be consulting this blog or Hacker News in general. ;-)
First, this may send the wrong message to the less-focused reader: "what, I should use block ciphers instead?". Luckily, HMAC is eventually brought up, which is a fine solution.
If the reader can't be bothered to read the article to the end, I hardly think it reflects on the author. Whilst it might indeed be a more concise article if it just said "don't use a hash function for message authentication, use HMAC", it would still miss the important final point about timing attacks, not to mention the journey of explanation about why you shouldn't just use a hash function.
Your point about padding and length bytes is spot-on. I actually left this out of the explanation, although it's interesting, because I felt it a useless diversion: something that's safe except in an easily describable subset of cases is still not fit for purpose when specially designed tools without these problems exist.
Point being, although hash functions might work most of the time, their general construction does not make them safe for this purpose. You can't say something is 'mostly secure' because it works 'most of the time'. The times is doesn't work will affect someone, and for them the system is not secure at all.
Plus, implementations with weird edge cases make for horrible debugging, especially when it comes to security.
> Your point about padding and length bytes is spot-on. I actually left this out of the explanation, although it's interesting, because I felt it a useless diversion: something that's safe except in an easily describable subset of cases is still not fit for purpose when specially designed tools without these problems exist.
I'm with you except for: "a useless diversion".
From the description in your article, I might think that if I have really short messages, I'm okay. I might think that if I reversed my strings, so any additional state is prepended instead of appended, I'd be okay. I might think that if I truncate my hashes (say, used the first 256 bits of SHA-512), I'd be okay. I'd be really, really wrong.
> Point being, although hash functions might work most of the time, their general construction does not make them safe for this purpose. You can't say something is 'mostly secure' because it works 'most of the time'. The times is doesn't work will affect someone, and for them the system is not secure at all.
That's the wrong point though. It's not that hash functions "might work most of the time". Uncompromised secure hash functions work all of the time for their intended purpose. For this other purpose, they work exactly none of the time.
The aspects I described that might make you think SHA-1 actually has this covered except for a few corner cases are trivial challenges for an attacker to overcome, as are all of the above "remedies" I mentioned (all of which resemble idiot moves that have in fact been made by naive developers who thought they could be clever and save themselves... I don't know.. maybe some CPU time?).
I do think the post was some great writing that illustrated an idea quite clearly. I think it could be quite informative for a lot of people. I just think what you wrote would benefit from being framed appropriately. I'd suggest a preface... something like, "HMAC's were invented to address the fact that you can't just use a hash for authentication. Somehow people miss this point, possibly because HMAC's seem like a trivial layer on top of a secure hash. In practice, there are a surprising variety of issues. To give you an idea of the kinds of problems you might run in to, and hopefully a sense that you can't just naively tweak your use of a hash to address them, I'm going to simplify a common problem with using secure hashes for authentication that has probably never occurred to you. I assure you that this problem is actually more complex, and there are many more problems."
Except not for the reason the author thinks.
There are several problems here.
First, with SHA-1 for example, you have 64 bytes per chunk. That means you basically get a free ride on this problem for anything < 64 bytes. A lot of "application state" fits pretty well in 64 bytes.
Secondly, unless a message ends right on the 64 byte boundary, it is not nearly that simple. You have a bit of a problem, because the hash is padded, and when you add extra characters to your original string, that padding gets replaced with those values. So, it's no longer simple to just "keep going" from where you stopped.
Still, you can see how that leaves a distinct subset of cases where you'd be exposed. SHA-1, along with most secure hash functions, appends the length of the message the end of the source text before performing the hash function. That means that if you add even one byte to the string, you have now changed the last 8 bytes that were fed in to the "original" hash function. Oh, and your extra byte goes in before those bytes, so not only did you change those 8 bytes, but you shifted them down a byte.
So, no, it isn't nearly that easy to crack a SHA-1 based authentication, and yet, it is easy enough that you should totally NOT use them for authentication and instead use HMAC ; they are vulnerable to extension attacks, it's just not nearly as easy as this article suggests, and conclusions one might draw from this article (like you can solve this problem by feeding all source text in to the hash algorithm backwards) are likely ill founded.
It just turns out that cryptography is way more complicated, and even in terms of understanding weaknesses that arise from doing things wrong, you are going to get it wrong. Trust the experts, when they say it is a bad idea, but don't assume why it is a bad idea can be explained in a short blog article like this.
UPDATED: Added an explanation as to why it might be dangerous to just take this article at its word.