This appears to be timeable. Rack and Rails just fixed a similar bug. Over LANs, attackers can figure out the correct MAC for any message.
Also, what the heck is a "salt" on an HMAC-SHA1 signature? If you lose the secret key, you've lost the verifier. You shouldn't be using HMAC signatures in so many places that keying them individually is a problem.
Thanks - this kind of feedback is exactly why we're putting this out for review.
Can you provide pointers or a bit more information as to how the timeable issue works, and what the workaround is?
Is there a more technically correct term we should be using instead of 'salt'? And is that feature a total waste of time? Remember, we're building a framework for other developers to use - so saying "you shouldn't be using HMAC signatures in so many places that keying them individually is a problem" assumes that developers using Django (who may have no crypto / security knowledge at all) will understand that. Is this something we should cover in the documentation?
We try to keep Django's dependencies down to the Python standard library + a database adapter, so I'm pretty limited on which underlying crypto libraries I can use.
How big a deal is that timing attack? Would it be appropriate to ignore it by default, but provide people who are concerned about it the option to swap out the sha1 signing for something using an external library that isn't affected by it. Or even just drop an optional "time.sleep(random.random() * 0.01)" line in to the code that generates the signatures?
It's a big enough deal that if I'm right (I looked at your code for about 15 seconds) and you ship it, you will look ridiculous when someone blogs the exploit code.
Thomas is the security consultant, I sell bingo cards to 60 year old women, take with appropriate grains of salt: the timing attack is impractical against almost all Django installations, and is difficult to exploit Internet-wide in an automated fashion (like, e.g., you can do for Wordpress vulnerabilities).
However, the impact is probably severe if your installation is vulnerable (i.e. if the bad guy can get sufficiently local access to your site over the network, and if you allow him sufficient rejected attempts). This can happen in plausible circumstances. For example, if you host on Slicehost and the bad guy is sufficiently motivated to pay $20 a month and buy himself a slice in your data center, congratulations, it is highly likely you are vulnerable to timing attacks in this manner. (This vulnerability is by no means unique to Slicehost, I'm just using this as a practical example of how a dedicated adversary can plausibly exploit a common deployment scenario.)
If you're vulnerable, the bad guy can compromise signatures given a long enough time. I am not a Django guy but I am a Rails guy, and one common pattern we have is to store things in the session where the session is serialized into the cookie. For example, we might store the authenticated user_id there. (I do.) The signing is what keeps someone from swapping out user_id = 12387 (their freebie account) for user_id = 1 (my all-powerful admin account) and convincing the system that the user logged into their browser is me. If you can forge a signature, free admin for you.
Given that this vulnerability has catastrophic consequences for vulnerable users and you probably encourage coding practices which make users vulnerable to it, I'd put it pretty high on my list of things to fix prior to release. But again, I am not the security guy.
I think that the issue is compounded here by the fact that Django is a framework which will be used in hundreds or thousands (or more) of applications deployed all over the world. Any security issue with the framework itself instantly puts thousands of apps at risk.
If you have a single app then you can discount this kind of attack, but when you are producing code that will be used in thousands of other apps, it would be irresponsible to do so.
I think a feature that is catastrophically insecure when deployed at Slicehost is pretty much D.O.A. anywhere, and that Patrick explained it nicely.
I'm sorry for being glib, but something about this Django request rubbed me the wrong way. I get the impression (probably unjustified) that tomorrow the dev team is going to say, "well, that timing problem was the only thing anyone found, so we're good to go". In fact, if they're serious about peer review, the fact that it took 15 seconds to find a critical vulnerability would be a good reason to hold it back for many weeks and to ensure that several competent people signed off on it.
I'm conflicted though, because every other mainstream framework has this functionality, and it's not that complicated to build. I wish they'd stop advertising goofy security "features" like "salts" and key rotation, though.
We've not checked this in yet precisely because, as you say, that good review is important. We don't have anyone in our core community who's a security expert, so that's precisely why we've tried to reach out to the broader web development community for review.
I'm really sorry that the request rubbed you the wrong way. In fact, it completely sucks: you're pretty much the perfect kind of person we wanted to interest in this. So on a meta level, do you have any suggestions about how to word or structure requests like this in the future? Can you clarify what, exactly, rubbed you the wrong way? Is there something different we should/could have done?
Your main suggestion here seems to be that we give it some time (right?) and I hear that loud and clear. I agree it'd be silly to call for a review on Monday and check it in on Thursday. We've always been a pretty conservative bunch of committers; our community's pretty used to waiting while features get the review they deserve.
The problem is that there's a chicken and egg situation with any code that's not committed (or is out on a branch). The vast majority of contributors (let alone developers) only see what's on trunk, so anything that's not yet there gets a simply pitiful number of eyes on it. Once something's committed/merged, a whole lot more people see it (which is why we always end up finding bugs in "stable" code only after it's merged). But shipping insecure code is just awful. As a maintainer, it's hard for me to tell whether a lack of feedback indicates a lack of issues or a lack of interest.
Regardless: thanks for your help and advice. I really appreciate it.
If Django's core community doesn't have anyone who can evaluate these types of code changes effectively, it might be wise to have the Django Foundation hire a crypto consultant.
It seems like there'd be a great market in post-Docs looking for this type of work.
I'm curious: do you have any idea what that'd cost? My impression is that anyone who's actually worth paying would cost well beyond what the DSF could actually afford. That said, stuff like this is why the DSF exists in the first place, so it's a worthy idea.
In fact, if they're serious about peer review, the fact that it took 15 seconds to find a critical vulnerability would be a good reason to hold it back for many weeks and to ensure that several competent people signed off on it.
(I don't have the final say on that, of course, but considering that I hadn't even seen the code until today I just don't see any way to both be a responsible release manager and ship this in Django 1.2)
Forget the dumb time.sleep() idea, I just looked at the keyczar implementation of constant time string comparison. I'll roll that in to the library, thanks.
Good. By the way, the time.sleep wouldn't even have worked -- the big deal with timing things on the web is that unless you run some kind of autobanning script, they get unlimited attempts. I'm pretty sure that if I get a hundred-thousand-similar-attempt dataset I can calculate the mean time to such accuracy that the random sleep time doesn't hamper me at all.
And I can collect that data over such a large time you might not even notice.
correctMac = self.Sign(msg)
if len(sig_bytes) != len(correctMac):
return False
result = 0
for x, y in zip(correctMac, sig_bytes):
result |= ord(x) ^ ord(y)
return result == 0
This code performs the same ALU ops on every byte of the MAC, and does not perform any bytewise comparisons if the MACs aren't the same size. It's not timeable.
It is quite easy to time for yourself. Try timing it many times with a sig_bytes off by one(only one character changed). Then compare that with the correct one. You'll notice the (small) difference.
There are no tests for it in the keyczar that I could find. I doubt they tested it.
The reasons are many... but this is python, so each operation is much more complicated than C/asm. There are lots of branches, memory usage and cache effects - so code is much more input dependant - especially the integer ops used here. Take a look at this code to see one of the reasons why this python code has variable timing depending on input(like most code does).
x,y = 3,3
id(x) == id(y) # python shares ints in different variables
The only data-dependent comparison you've got to work with is a comparison between two register-scale fixed-width integers. I think you're doing something wrong.
Perhaps whatever you're doing to create the "off by one" case is disrupting the cache, and you're just seeing the effects of that.
> The sleep needs to be long enough, and random enough that it makes the other variance statistically insignificant.
Given the restrictions, this is actually impossible. (the operation has to be quick enough that it doesn't significantly erode user satisfaction, and the smallest possible "sleep" is a single machine instruction). Your adversary can easily collect millions of samples, or even a lot more. You cannot hide the latency difference well enough.
Personal opinion: It's probably one of the reasons we're the most popular Python web framework, especially amongst people with no prior Python experience.
We may change that policy in the future (we've discussed it) but we're not going to change it just so the signing feature can get in to Django 1.2 - and I'd really like to get signing in to 1.2.
Nobody is going to have their identity stolen because of this feature, but it would be nice if they stopped implementing any more crypto. Arguing against signed cookies is tilting at windmills, but that doesn't mean marketing crypto features is a smart move.
Adding to what Simon said, dependencies in Python are an absolute mess, with no truly reliable mechanism other than saying "here's a list of what you'll need, go install it all manually". So every additional dependency is a major burden in terms of packaging, documentation, learning curve for newbies, etc. and we try to avoid inflicting that unless it's absolutely necessary.
(though personally I think I'd prefer to outsource crypto to, you know, people who actually know how to do it)
I just had a more detailed look at Keyczar and it's pretty awesome. Unfortunately it depends on pycrypto which is a tough dependency for Django as a whole.
We already have a mechanism for allowing you to "swap out" the hashing backend used for Django's get_signed_cookie and set_signed_cookie functions. It wouldn't be at all hard to write an optional keyczar backend for that. I think I'll do that. I'll need to extend the configurable backend aspect to cover other things within Django that use signing as well.
I just tested the python fix there, and it was easy to find the difference with the python function given.
The solution is to rate limit, and add random small sleeps. A tiny random sleep stops the variance from a string compare showing up. Rate limiting and blocking slow down an attacker from trying out lots of attempts anyway.
If you have decent signal processing skills you should be able to show another keyczar exploit.
That's a terrible fix. Adding noise degrades the signal without eliminating it. To defeat that countermeasure, you just take more measurements. The random sleeps, which are by nature random, get filtered out.
The Keyczar Python fix --- if we're talking about Nate Lawson's code --- has no data-dependent timing. It invariably does the same ALU ops on every byte of the string, and fails if the strings aren't of the same length. That's the right solution to side channel vulnerabilities: eliminate the side channel.
I just want to say that it is really refreshing and awesome to see a framework developer/author/team reaching out to the community as a whole for this kind of review, rather than requesting feedback from a small group of committed contributors.
I can tell you're pushing for terseness in the signed text (every character does count), but I'd definitely consider explicitly describing the cryptosystem and version you're using in each cookie.
Also, is there some code elsewhere ensuring that the separator character is always escaped during signing? I don't see anything explicit, but I could definitely be missing something; I haven't messed with Django internals much..
This is a pointless feature and Django shouldn't implement it.
If Django-signed messages are being persisted long-term in something other than an HTTP cookie, Django developers are abusing the feature. The sliver of code that's been published is not sufficient to protect long-term persisted data.
If Django-signed messages are simply being used for cookies and token URLs, then it doesn't matter whether they're forward/backward compatible. They have no long-term value. Applications should fail gracefully when unintelligible messages are presented. In 99.999% of proper cases, this simply means bouncing the user back to the login prompt to get a new cookie.
This isn't just pedantry. A fair subset of all cryptosystems have failed in later revisions because of negotiation vulnerabilities. Along with the "automatic key rotation" misfeature, this idea is as likely to burn you as it is to protect you.
You're right that it should just fail gracefully, and that people shouldn't use the data for anything long-lived. I'm not so sure the latter won't happen often enough for compatibility to become an expected feature, though.
What I'm afraid of is the future hypothetical case where the scheme changes in the future and a Django developer decides to add in backward compatibility anyway--by having the verifier check the presented text against both the old scheme and the new one.
If your main point is "don't reinvent the wheel, use an established system," though, I totally agree.
Negotiation in cryptosystems is usually a bad idea. It's a bad idea here. Anything that would make that feature useful would be an abuse or a threat.
One of the worst ideas Simon is getting from Reddit right now is that he needs to make this system more sophisticated. Version the cryptosystem! Use truncated SHA256! Revoke messages on MAC failures! Use random sleeps! Automatically expire keys! Look at this NIST standard I found!
What they need to do is what every other web framework does, because every other framework has been inspected already.
"Applications should fail gracefully when unintelligible messages are presented. In 99.999% of proper cases, this simply means bouncing the user back to the login prompt to get a new cookie."
Consider a load balancing pool where a fraction of the servers accidentally get stuck on the old cryptosuite, or a few get prematurely upgraded to a new cryptosuite. Blindly bouncing to the login prompt a few percent of the time would be painfully difficult to debug.
Idea: the plaintext shall include a version number, and possibly an identifier like an IP address for the server that generated it. If the version number mismatches, log an explanation and the identifier it on the server then bounce to a login prompt.
To expand on the first part: with something up front saying "this key is signed using system FOO", you'll be able to support multiple systems in the future. This means that you'll be able to handle multiple systems (and upgrades) easily.
If the server can handle keys that are signed by either system FOO or system BAR, and the keys themselves don’t provide any clue about how they were signed, the server can just try to verify the signature using FOO and, if that system doesn’t return a “signature valid” response, fall back to BAR.
Obviously if you’re permitting ten different signature-verification algorithms then this technique starts putting a significant load on the server, but if your protocol allows for ten different signature-verification algorithms, then you have bigger problems, right?
OW MY BRAIN. No, don't truncate SHA256 to SHA1 sizes. No, don't use MD5 to make your URLs shorter. No, DO NOT clear all of a user's signed cookies when an HMAC fails --- these aren't passwords, they're crypto secrets.
Also, what the heck is a "salt" on an HMAC-SHA1 signature? If you lose the secret key, you've lost the verifier. You shouldn't be using HMAC signatures in so many places that keying them individually is a problem.