Correct me if I'm wrong, but it appears as though Django isn't the only framework/technology that is vulnerable to such an attack, they're just one of the first to provide a mitigation strategy (resulting in this post).
* Be served from a server that uses HTTP-level compression
* Reflect user-input in HTTP response bodies
* Reflect a secret (such as a CSRF token) in HTTP response bodies
is vulnerable, regardless of technology.
The mitigation strategies were given in the original paper[1], this announcement is just repeat of what's in there. That said, it's exactly the right thing to do, that's not a knock on Django.
From my understanding (which may be wrong...), the requirement of "Reflect user-input in HTTP response bodies" is actually pretty important. If the application only does this on POST requests, then it should probably be fine. Since an attacker cannot formulate a valid POST without the CSRF token (assuming the app is using CSRF tokens correctly), then there is no way for an attacker to get this attack bootstrapped.
If the application reflects GET request input in the response (eg `https://domain.tld?q=ASDF` results in some `value="ASDF"` being included somewhere in the response), then it is indeed likely vulnerable. This allows the attacker to simply continually change the value of `ASDF` as they guess and check for some secret on the page.
Of course, if your application is allowing untrusted POSTs to be made, then you will still have to worry about POST requests...
Absolutely, I'm just saying that your statement that it's the only must have is factually incorrect. I could include some sort of secret on every page, but since my blog doesn't reflect user input, it would be fine. I could also have a 'search' function that would reflect input, but without the secret in the body, the secret would be fine.
All three parts are a must-have, even if they are incredibly common. Saying otherwise is misleading.
Yes, that's correct, in theory BREACH can be used to target any sort of secret embedded in the body of an HTTP response. CSRF tokens are the most common type of secret in that category, but there are others. However, we can't speak authoritatively to The Web or All Web Frameworks or anything, but we can advise our users on how they can protect themselves.
I suspect that the particular mitigation strategy the BREACH authors' describe as "Randomizing secrets per request" could be implemented by having {% csrf_token %} instead emit:
...where the random_data changes every response, the emitted csrfmiddlewaretoken_xor is the real token XORed with the random_data, and upon submission the server will again XOR the two values together to get the real CSRF token.
There may be other secrets that need protection in other ways, and maybe this would make any random-source issues more exploitable... but this would seem to protect the CSRF token, in a cheap and minimal way.
UPDATE: Thinking further, though, maybe the attacker can probe for both values at the same time, and thus determine the probability of certain pairs, and thus this only slows the attack? I'd appreciate an expert opinion, as this was the first mitigation that came to mind, and if it's wrong-headed I'd like to bash my intuition into better shape with a clue-hammer.
Thanks, but can you clarify... does that mean probing (a..z)×(a..z) (one pair per probe), so there's at least a giant increase in probing required per character? And perhaps even more each character in, since probing for the Nth character now requires (a..z)^(N-1) × (a..z)×(a..z) ?
(I'm guessing also, though, it may be possible to probabilistically probe multiple ranges of the secret at once... in a process that seems vaguely similar to forward-error-correction coding.)
1) The attacker must be on the same network as you, or at least be able to detect how large the compressed and encrypted replies are.
If you are on the same network it seems to be there are far more MITM and whatnot attacks that are more likely to succeed, if you do not use HSTS (or secure DNS if that helps).
2) The attacker must be able to get your browser to rapidly generate many (how many?) requests from your browser to the site. It takes "30 seconds" they claim, but is that at a rate 100 requests per second?
3) Each request must carry something that will be reflected by the body of that particular page when it's rendered. I suppose it could be an error message or search string that's echoed.
It seems to me that unless you generate a CSRF token unconditionally on every page, the subset of pages that both reflect something with no protection (e.g. search results) and have a protected form (e.g. change my email address to XYZ) might be small.
4) The secret that can be extracted is what's in the reply body and not the headers -- headers are not compressed, since the TLS compression is now universally disabled post-CRIME.
Personally I use Referer header checking as well. IME all the browsers of my users do send them. So if you extract the CSRF token, it's useless by itself unless you also can make the browser send the right Referer header (and AFAIK, all the holes such as Flash have been plugged).
Other than that -- it seems that if you are normally generating e.g. a 32 byte CSRF key, you could interleave it with 32 bytes of good randomness per request?
>Personally I use Referer header checking as well. IME all the browsers of my users do send them. So if you extract the CSRF token, it's useless by itself unless you also can make the browser send the right Referer header (and AFAIK, all the holes such as Flash have been plugged).
This long comment boils down to "so this is an attack that negates the protection that TLS gives you in defending against CSRF attacks". Yes, in the CSRF case, that's what the attack is.
It takes "30 seconds" they claim, but is that at a rate 100 requests per second?
I took the 30 seconds to be the amount of compute time required after they had their samples. Otherwise it is a meaningless number, they could say it takes 0.5 seconds at a rate of 6k requests a second, or 3 thousand years at the rate of 1 request per year.
>Other than that -- it seems that if you are normally generating e.g. a 32 byte CSRF key, you could interleave it with 32 bytes of good randomness per request?
Which would be pretty easily averaged out with a few more requests. It makes the attack a little harder, but not substantially so.
One could argue that when talking about security, it's always about making things harder to breech, not a full proof protection..
I'm not sure how adding 32 bytes of "good randomness" would help.. because the size might be very similar since the randomness might not get properly reduced. And thus, the slight variation in size will still be very relevant.
However, adding between 1 and 32 bytes of randomness might be a pretty good counter! I.e. If you request the page with the "guess letter A", and then request again the same page with the same "guess letter A" and you get +/- 32bytes of different encrypted stuff, it's very hard to assume something was better compressed.
The cool thing about that is that it's fairly trivial to do with most implementation of CSRF. Thoughts?
The attacker just needs to work out the average size of the normal response, then the average size of the response with the extra character. Send enough requests and the difference will be noticeable. It slows them down, but doesn't fix the problem.
Regarding #1 and the other attacks that are more likely to succeed, are you referring to the propensity of users to bypass certificate warnings, or is there something else in play there?
Well, the certificate warnings have gotten pretty grave now, though I could imagine showing users an intermediate page telling them Microsoft has screwed up and you may get certificate warnings... which you should just ignore, trust us.
I was just thinking of hijacking DNS locally (i.e. when browser asks for yourbank.com, send them your IP), and making yourbank.com then redirect to yourbank.myfreehost.com -- which could have a legitimate SSL certificate and a copy of all the branding.
That seems more likely to succeed to me. I guess you could try BREACH if you have some high value target you know is using a very specific website. Anything bitcoin related: either the users themselves or admins. Robbing bitcoins is like robbing banks in the Wild West.
> Personally I use Referer header checking as well. IME all the browsers of my users do send them.
IME that is not true. Certain corporate/government machines have Referer headers turned off for some strange "security" reason. Last time I ran into this when working with Architect of the Capitol (http://www.aoc.gov/), when they couldn't log into a management panel because Django's CSRF protection checks the Referer header.
All these framework-vendor guides will recommend switching off Gzip, because it's a content-neutral workaround; it works everywhere, for every instance of the attack, no matter how you've coded your app. There are more specific workarounds, but they require changing how you encode secrets into your page, so there can't really be a vendor guide on how to do that; the vendor doesn't know how and where your app sticks secrets into its views, after all.
I'm sure everyone is going to come up with workarounds that re-enable compression, but they'll be context-dependent and will involve code; in the meantime, the attack is straightforward and viable. Think of disabling compression as a stopgap.
Definitely think about it before just doing it though...
Disabling compression can break some apps. Especially when they rely on huge compression ratios for text (5-10 times ratio is common for with much json for example). So that is not an app agnostic work around. For example, a 100k of json request, can turn into a 1MB json request. The more data required to send, the more chance of error - especially on 3g/2g networks.
For many high end projects, just disabling compression without regard to testing or having an idea of what the application is doing would get you fired or taken to court.
Not only would this break apps, but it would also lose business in that there is evidence from Amazon and others that every 100ms extra latency can cost 1% in sales.
From SPDY whitepaper: "45 - 1142 ms in page load time simply due to header compression". Remember that headers use the upload part of the link... which means too many headers and you can saturate the upload, therefore making the whole internet connection stall for everyone using it. Common upload limits are only 5-10K/second, so excessive headers combined with many requests can easily DOS many internet connections.
I spend a lot of time optimising websites for these reasons, and disabling compression could add 20 seconds of load time for a good percentage of users.
So, for many apps, turning off compression is no solution at all. You might as well just disconnect your app from the internet - that will also give you a secure and broken app.
A proper risk, and impact analysis should be done first. Too often quick hot fixes to security issues just break things or even make things less secure.
Yeah turning off compression completely is kind of crude but it works without having to go into app-specifics, I'm sure the Django folks are on it ;). When it comes to things like CSRF tokens, probably secret masking (4) is the easiest to implement? so something like
Length hiding was shown to be ineffective in the article (by adding random noise). Perhaps a fixed length response would work better- or perhaps one that is heavily quantized? Really, production environments are not the place to try un-vetted academic crypto research.
> Really, production environments are not the place to try un-vetted academic crypto research.
A very salient piece of cautionary advice. Disable gzip to protect prod. Figure out what to do to allow compression off prod, and engage the devs of your stack/framework to do this correctly.
You can still safely compress your static files. So, assuming that you don't send any secrets in your CSS, JS etc., you can configure your server to enable gzip only for these resources.
For example, when using nginx and with gzip off globally, you can do :
Doesn't matter if they share cookies. Static files reflect neither user-supplied data nor secrets in their contents, so they can't be used in a BREACH attack.
unlikely to affect you if you're already embedding CSRF tokens in your responses, which would defeat caching anyway. Curious if this response length fiddling will mitigate the attack, can anyone more knowledgeable than me confirm?
A few days ago, Meldium's announcement of a Ruby gem that provides an inexpensive partial protection (i.e. not disabling gzip) made it to the HN front page:
The two protective measures are masking the Rails CSRF token and appending a HTML comment to every HTML doc to slow down plaintext recovery. How easy is this to include in a Django plugin?
This would cause a big problem for us. we mobile web service serves around 3-4k concurrent requests on average. without compression our API would take 300% - 900% increase in the delay.
is there any alternatives ? would like to know what Cloud Flare would do as their CDN is based on compressed nginx responses.
As gzip compression only applies to the content of the page, not the headers, I would assume that prefixing your page with content that is variably compressible and of varying lengths would throw a monkey wrench in the attacks.
The compressed content of any part of a page very much depends on what came before it. Altering the content to include a script comment block full of random text and various common HTML and JavaScript elements (Markov chains anyone?) would definitely change how a page is compressed.
If the compressed length of the replies varies significantly with every request - even if the request content is identical - attacks like this can no longer reveal hidden information.
Edit:
You could improve this significantly by including false positive matches as well. If your HTML content has: csrf="45a7..." in it, you could hash that content into enough material to generate 19 or so identical looking code blocks embedded in a script comment. You've now provided a 95% chance they attack the wrong one / increased the number of attacks they'll need to try by 20x.
This method (minus the above part) would actually be cacheable by smart CDNs like Cloudflare.
The attacker has to be able to issue requests on behalf of the user with injected "canary" strings. I fail to see a practical exploit where one can do this and wouldn't have access to the secret in the response anyway. What am I missing?
Does any GET or POST URI endpoint in your application accept parameters? Do none of those parameters impact the output of the application? That set of circumstances is extraordinarily common.
The request has to be issued by the attacker from the victim's browser. If the attacker can do that, why is he unable to read the response to that request?
Edit: I think I can see a scenario where a third-party website does these requests via an <iframe> or an <img>. I'm not sure there's a way to do POST quite as easily.
Do you understand how CSRF works? Just think of it in terms of CSRF. Since the attacker is trying to infer page content, they don't care that the server rejects all the probing requests, so CSRF protection doesn't help you as the attacker carries out the BREACH/CRIME stuff. If the result of the attack is an inferred CSRF token, they then cap the whole exploit off with a (now working) actual CSRF attack.
I understand how the attack works, the question was about how a practical exploit would actually be carried out. I've figured out how one would issue GET requests from the right environment, but I don't know if the same is possible for POST.
No, you're missing that the original GET requests can be performed in some cases over HTTP, either by forgery or by surreptitiously spoofing the user's own browser into doing it. No need to have compromised the SSL/TLS.
How about having the CSRF token change with each request? If it's encrypted/signed by the server for each request with a random IV then it would be different in each request. It would be a bit more processing on the server (decrypt vs just HMAC verify) but it would be completely different each time. It seems kind of belt and suspenders as you're encrypting data within an encrypted channel but I think it gets around this issue.
If the CSRF token changes with each page view then opening a second page (perhaps an explanation for a form field) in a new tab/window would invalidate the form in the original tab/window.
Not necessarily. The token can be used to simply verify that the request came from a legit page and not cross site request. The encrypted CSRF need only be verified by the server to see if it's not expired. The server can store the expiration in the CSRF token itself (encrypted and signed). It does not need to maintain a list of the CSRF tokens.
I don't think so. Here's the snippet from the linked PDF[1]:
> DEFLATE [2] (the basis for gzip) takes advantage of repeated strings to shrink
the compressed payload, an attacker can use the the reflected URL parameter to
guess the secret one character at a time.
By encrypting the CSRF token (or any other "secret" data you want to roundtrip from server to client and back) with a random IV per request this wouldn't work. The value sent by the client would not be the same as the new token generated by the server (since each has a random IV). Even though the decrypted value of each token is the same, the values presented to the client in the response body are each different and not predictable (to the client).
Has anyone looked at mitigating the attack by changing the behavior of chunked transfer encoding?
Chunked Transfer encoding is basically padding that a server can easily control, without having to change content or behavior of a backend application. A web server could easily insert an order of magnitude more chunks, and randomly place them in the response stream.
I'm not sure I fully understand the proposed fix here, how does it differ from the application simply including random chunks of data inside the response?
This area of things isn't my strong suite, but assuming that this is analogous to just adding random data to the response, I believe that simply adding random data to the response can be worked around by doing more requests as using statistics to factor out the noise introduced.
I have seen random workarounds at the app level as well, where the app adds a random length HTML comment on the end of the page.
But if random can be statistically removed, then they shouldn't add a random amount. Maybe just track the max size of the returned response and always add enough to reach that max size. Therefore the lengths of all the pages will always be the same. This is still better than turning off compression completely. A typical max for a detail page in an app might just be the size of the page plus 256 bytes per app output field.
So, it seems that even if I encrypt everything, a lot of information is still present in the size of encrypted message; in case of VOIP, it's possible to guess speech that is being transferred over an encrypted transport, in the case of text, it's possible to figure out secrets if the attacker can modify an equally-sized part of the message.
Is there any general way of preventing this kind of attacks? Inserting random data could work, but it's distribution would have to be exactly right for the attack to be impossible over longer periods of time. For the BREACH case, we could solve it by not compressing user input, but what about the VOIP case?
Also, why does the site http://breachattack.com/ says that "Randomizing secrets per request" is less effective than disabling compression?
> Is there any general way of preventing this kind of attacks?
Disabling compression is a 100%-effective countermeasure for compression oracle attacks.
> Also, why does the site http://breachattack.com/ says that "Randomizing secrets per request" is less effective than disabling compression?
Putting random data in the server response will only slow down the attack. With enough requests, the noise from that random data will wash out.
Disabling compression will stop the attack cold. The whole thing is predicated on analyzing the size of the compressed text. No compression, no compression oracle.
How is it that random data would only slow it down? If I add a random field to my response of variable length, say, from 0 to 50, with completely random characters, it should completely throw off this attack. The length of the output will change from request to request.
I suppose, given infinite time, you could send the same request over & over and map the variance of content lengths, and get an idea of what the actual content length was before random padding? But the compression seems to throw that off even more AFAIK - because the data we pad with is random, it could very well accidentally compress well because of the rest of the data in the response, further throwing off any guesses.
Edit: From the pdf on breachattack.com:
While this measure does make the attack take longer, it does so only slightly.
The countermeasure requires the attacker to issue more requests, and measure the
sizes of more responses, but not enough to make the attack infeasible. By repeating
requests and averaging the sizes of the corresponding responses, the attacker can
quickly learn the true length of the cipher text. This essentially boils down to the
fact that the standard error of the mean in this case is inversely proportional to p
N, where N is the number of repeat requests the attacker makes for each guess.
I think it is still discoverable because adding random data leads to the following two distributions:
Attacker gets secret wrong, page is size: original page + (zero to fifty) + length of incorrect secret
Attacker gets secret right, page is size: original page + (zero to fifty)
With a sufficiently high number of observations the attack with the right secret has a mean value that is lower than the attack with the incorrect secret.
At least that's what it appears to me. I could be wrong.
> Putting random data in the server response will only slow down the attack.
Yes, that's what I mentioned in my post. However, the way I understand "Randomizing secrets per request", it means sending new, random secrets with every request (i.e. generating a new CSRF token for every request).
Changing the CSRF token with each request would work, but you also risk frustrating your users this way. If you have more than one tab open on the same web application, you could only submit a form successfully from the "freshest" of these.
Well, technically you don't need to invalidate the earlier CSRF tokens. Saving a big number of CSRF tokens per user would of course require quite a lot of storage, but maybe you could devise some "clever" scheme, e.g. token = "n" + sha(user_secret + "n"), which would be random enough for preventing BREACH, but easy enough to check.
Imagine you're going to send a compressed and encrypted message to a friend, and I (the attacker) can do two things:
1) Append a bit to the message before it is compressed and encrypted.
2) See the size of the final message.
So I start by appending the string "4179174b19e0cdc91bf4" to your plaintext message. I see the final encrypted message size is 500 bytes.
Then, I redo the experiment, but this time, I append the string "cschmidt@example.com" to the message. The final encrypted message size is now 480 bytes. The string I injected was the same size, but the compression worked better this time, and I can guess it's because the string I picked is redundant with something in your plaintext.
Mix in a bunch of complicated math and a bit of javascript, and you've got an exploit.
This threat isn't specific to Django: it's being billed as a TLS attack, but any encryption system that uses compression the same way is vulnerable.
> Let's let this stew for a while with security researchers doing their
> analysis on various approaches and wait and see what the security community
> as a whole recommends.
>
> My only concern to rushing out a release is that we do something equally
> dumb and end up creating a different problem for our users.
>
> We can roll out fixes as it becomes clear what the consensus is as to the
> best solution for a generalised framework like Rails.
* Be served from a server that uses HTTP-level compression
* Reflect user-input in HTTP response bodies
* Reflect a secret (such as a CSRF token) in HTTP response bodies
These things are easy to tell about your application, but are much harder for frameworks to detect generally, which is why projects like Django and Rails will take some time to evaluate exactly how to best handle this at the framework level.
I'm not all that familiar with BREACH, so please correct me on the parts I'm wrong about, but it seems that it's an attack that allows one to recover some data sent over TLS if compression on TLS and the protocol level is enabled.
In Django, this means that attackers could recover the CSRF token that's used to prevent cross site requests. This means anyone between you and a client could later have that client automatically make authenticated requests to your app, simply by visiting a site they control, without the knowledge of the user.
To protect yourself, the Django team recommends turning off compression either at the TLS level and at the HTTP level.
The only part you're wrong about is the compression on TLS part, it's not an 'and':
> While CRIME was mitigated by disabling TLS/SPDY compression (and by modifying
> gzip to allow for explicit separation of compression contexts in SPDY),
> BREACH attacks HTTP responses. These are compressed using the common HTTP
> compression, which is much more common than TLS-level compression. This
> allows essentially the same attack demonstrated by Duong and Rizzo, but
> without relying on TLS-level compression (as they anticipated).
and
> It is important to note that the attack is agnostic to the version of
> TLS/SSL, and _does not require TLS-layer compression._
Just because it's short doesn't mean it's simple to understand. Also, the recommendations aren't about how to fix things in Django, but instead a way to prevent this attack from happening. Hardly a long-term solution, especially because it basically asks you to avoid using GZIP for compression, which many people and organizations rely on in order to process timely responses.
The big bold text ("BREACH may be used to compromise Django's CSRF protection") is a strong warning of the threat (becoming vulnerable to XSS). They list two steps that they recommend taking; disable the gzip middleware in your settings.py, and disable gzip for responses from your web server.
Could somebody help me understand how this attack would be viable?
It seems like the attack has the following requirements:
1. You want a secret that appears in the response body, like a
CSRF token.
2. The web server always responds with the exact same response
for a request.
3. The response body contains data that you send to the server,
e.g. url params.
4. The attacker has access to an environment where he can send requests
under your browser session (otherwise, the user would be
unauthenticated and there would be no secrets to steal).
Given (4.), how is this a real concern? If I, an attacker, am able to make 3000+ requests while logged in under your session and modify the request character by character pre-encryption, doesn't it logically follow that I have your cookies anyway?
The #4 is not that difficult without compromising the user's browser -- as long as the user can visit the site under your command or see some HTML under your commend you can make the browser do a HTTPS request to anywhere at all.
Maybe you buy some targetted ads served in an iframe. Maybe you send the user an email where his email server either always shows images, or you trick the user in clicking 'display images' with promise of kittens.
You won't be able to see the results directly, but if you can observe how long the encrypted responses will be, you'll know whether your reflected input could make use of the compression dictionary (meaning your reflected input matches the secret) or not.
I wonder if there is any way to even do this without the passive network snooping -- like some kind of internal browser stats API call that tells you # of HTTPS bytes transferred. It could be innocent enough so it's not protected.
No, it does not logically follow. The attack means that someone who can (for instance) poison the DNS can (for instance) evade CSRF protection for the domains they've poisoned.
Looking at https://github.com/django/django/blob/ffcf24c9ce781a7c194ed8... I'm a little confused about how the csrf-token is generally used in Django -- but if I understand the code correctly, it looks for a cookie with the csrf_token, and compares that to a POSTed value (or x-header in case of an Ajax request).
If the system has a decent random-implementation there is no secret involved, just a (pseudo)random string -- essentially a csrf cookie is given the client on one request, and compared on the next request(s).
Is there any reason one couldn't simply use the rotate_token()-function on every (n) request(s)?
Just to make sure I understand this correctly: is this only a security issue if you include sensitive information on a page by default?
For instance, if you had a search field, the contents of what users puts in that search field will not be compromised. However, if you include a csrf token with the search field form, that can be compromised since it will be there every time the attacker gets the victim to make a request.
I've knocked up a package that provides CSRF token masking and length modification that may help mitigate this. If anyone wants to vet it and submit pull requests, you're more than welcome. https://github.com/lpomfrey/django-debreach
To your second point, security@djangoproject.com. It's documented in a bunch of places; where'd you look for it? I'll add it there, too :)
To the first point, we believe that Django's CSRF protection is as strong as session-linked CSRF protection, and adds CSRF protection to anonymous users (users without a session as well). In other words, it's a design decision, one that we believe doesn't compromise CSRF protection. If you believe otherwise, please get in touch (see above).
1) session and logged in/out user are two different things. Session is the way you store information about current user, no matter he is anonymouse or logged in.
2) I checked again for instance https://bitbucket.org/ - edit csrftoken cookie to any, 123123 for example. Reload the page and if site keeps working - Cookie Forcing with MITM will do the same thing using http: injector Set-Cookie
not only MITM, subdomains can do precisely same thing. Either bitbucked uses old django or django is vulnerable to it (which is, well, a severe vulnerability imo)
1) I know the difference between sessions and logging in. I didn't say anything about logging in; I said that our CSRF protection protects users without sessions. Not all sites use sessions (some for performance reasons, others for privacy reasons); must those sites be vulnerable to CSRF?
2) First, you should report this to Bitbucket: https://www.atlassian.com/security. And c'mon, disclosing a possible CSRF vulnerability on a public board is kinda irresponsible. Is responsible disclosure not something you practice?
SecondI don't know what Bitbucket is running, exactly, and exrapolating from Bitbucket to Django is pretty lazy. Frameworks != sites. Once again, we've spent quite of bit of time validating the design and implementation of Django's CSRF protection, and we believe it works. If you find proof otherwise, can you please send it to security@djangoproject.com, and not post it to Hacker News?
1) only to make sure we are on the same page. Now I see - we have different understanding of "session".
>Not all sites use sessions (some for performance reasons, others for privacy reasons);
what kind of site doesn't use sessions? To track a user you need a cookie right?
2) Frameworks != sites.
As I used to think, only framework is responsible for CSRF protection, hence I extrapolated. I sent it to security@ as soon as I found this email. I am trying to not proclaim anything but some websites from http://www.djangosites.org/ are vulnerable.
Ok, plausible attack I thought up based on what I could suck up from homakov's and various other posts:
1. User is browsing an HTTPS Django site and a HTTP site on WiFi.
2. Hacker is MITMing a connection (on WiFi), cannot decrypt SSL.
3. Changes user's CSRF token for Django via cookie-forcing.
4. Hacker can now use user's other HTTP session, and inject JS (or whatever) so their browser sends stuff to the HTTPS Django site, fully knowing their CSRF token (because we set it) and thus can forge requests easily.
Does Django mitigate something like that already? I think should be pretty easy to mitigate by using Django's signing to sign the session ID or something into the CSRF token?
An unrelated HTTP session cannot set a cookie for another domain (unless it's a subdomain in which you have the more serious issue of session theft or session fixation). The solution to both of these problems is HSTS with the includeSubDomains option.
I've just woken up and I haven't tested it, but off the cuff I believe HSTS will prevent the browser from trusting a plaintext HTTP response at all. So you cannot force a cookie if I understand the blog post correctly. You'd have to create a cookie inside of a verifiable HTTPS connection, which if you can do you've already executed a much worse attack.
EDIT: I described the link as 'first' in the Google results, but that was because Google was being helpful and promoting a page I've visited a lot before... In reality, it's a few links down.
I've never really looked at Django's site before, picked 'community' on the upper right, and it says
> Report potential security issues in Django via private email to
> security@djangoproject.com, and not via Django's Trac instance or the
> django-developers mailing list
The community page (where all the mailing lists and contact addresses are listed) say:
>Report potential security issues in Django via private email to security@djangoproject.com, and not via Django's Trac instance or the django-developers mailing list
Django's CSRF protection is perfectly fine other than issues with BREACH.
In any case that you can edit the CSRF token you already can execute a much stronger attack (MITM, XSS, etc). If you have a way to set your own arbitrary cookie that doesn't require a much work attack that already includes the ability to do arbitrary requests without them needing to be cross origin then I heartily suggest you report it.
A browser will accept cookies from a HTTP response on a site that has HSTS set?
A CSRF request from a plaintext subdomain would not include a header and would fail Django's CSRF for lacking a referrer header (Strict referrer checking only available under HTTPS). Further more even a subdomain under TLS would fail to have the same origin.
A subdomain can set cookies this is true. This requires a XSS on the subdomain or allowing plaintext responses on subdomains. If you do not ensure both of these then besides being able to set (or in the case of XSS, read) the CSRF token you can also fixate the session, steal the session, preform a DoS using the size of the cookie, etc.
The solution is forced TLS with HSTS and includeSubdomains.
>The solution is forced TLS with HSTS and includeSubdomains.
that's true. Few questions:
1) does django provide HSTS header by default? didn't find it in codebase
2) does django has includeSubdomains? bitbucket doesn't have it
3) it is still vulnerable to subdomain-XSS tossing.
taking into account all possible ways to replace the token I think it's django's duty to couple it with session (whatever it is in django) by default. What do you think, should it stay semi manual?
1) Not currently (and it's unlikely to be able to do so) there's an addon to add it which has been talked about getting it into core. I do believe it's documented to use HSTS and TLS if you want your site secured and recommends the ddon.
2) The above addon does have it. I have no idea what bitbucket does.
3) Not sure I understand what this means, you mean if there is a valid XSS on a subdomain?
Decoupling CSRF and Sessions has been a requirement for us for awhile now. We've spoken a little bit about optionally coupling them where if you have the session framework enabled it will couple them but if you don't it falls back on the current method.
P.S. i am not into django, but if you have a clue how to contact authors... please tell them to put CSRF token into session cookie. It must be fixed in the first place, BREACH is 100 times harder and longer, while cookie forcing is completely viable attack with active MITM. Or perhaps it was fixed? I checked it on bitbucket the last time..
Again, we believe that sessions and CSRF protection can be orthangonal (and that there are benefits to doing so). If you can prove otherwise, let us know!
There's also https://github.com/mozilla/django-session-csrf, an alternate CSRF implementation by Mozilla that does use session-linked CSRF tokens. So if you insist on "tokens must be session-linked", you can use that instead.