Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Security advisory: Breach and Django (djangoproject.com)
260 points by Lightning on Aug 6, 2013 | hide | past | favorite | 139 comments


Correct me if I'm wrong, but it appears as though Django isn't the only framework/technology that is vulnerable to such an attack, they're just one of the first to provide a mitigation strategy (resulting in this post).


Any website that

  * Be served from a server that uses HTTP-level compression
  * Reflect user-input in HTTP response bodies
  * Reflect a secret (such as a CSRF token) in HTTP response bodies
is vulnerable, regardless of technology.

The mitigation strategies were given in the original paper[1], this announcement is just repeat of what's in there. That said, it's exactly the right thing to do, that's not a knock on Django.

1: http://breachattack.com/#mitigations


> * Be served from a server that uses HTTP-level compression

this is the only must-have. Last 2 exist almost in every website (everyone needs a CSRF token, everyone reflects something somewhere)


My blog does not have CSRF tokens, and does not reflect user input.

Many web _apps_ have these things, but many web sites do not.


From my understanding (which may be wrong...), the requirement of "Reflect user-input in HTTP response bodies" is actually pretty important. If the application only does this on POST requests, then it should probably be fine. Since an attacker cannot formulate a valid POST without the CSRF token (assuming the app is using CSRF tokens correctly), then there is no way for an attacker to get this attack bootstrapped.

If the application reflects GET request input in the response (eg `https://domain.tld?q=ASDF` results in some `value="ASDF"` being included somewhere in the response), then it is indeed likely vulnerable. This allows the attacker to simply continually change the value of `ASDF` as they guess and check for some secret on the page.

Of course, if your application is allowing untrusted POSTs to be made, then you will still have to worry about POST requests...


you're right, also, to make compression work, ASDF is not enough, attacker needs https://domain.tld?q=value="ASDF


since it's attack on secrets then we don't even consider blogs and read only media as targets :)


Absolutely, I'm just saying that your statement that it's the only must have is factually incorrect. I could include some sort of secret on every page, but since my blog doesn't reflect user input, it would be fine. I could also have a 'search' function that would reflect input, but without the secret in the body, the secret would be fine.

All three parts are a must-have, even if they are incredibly common. Saying otherwise is misleading.


Don't forget the requirement that the site must use SSL/TLS in the first place!

On the other hand, if you're saying that we just take this for granted these days, you have made me very happy. :-)


Yes, that's correct, in theory BREACH can be used to target any sort of secret embedded in the body of an HTTP response. CSRF tokens are the most common type of secret in that category, but there are others. However, we can't speak authoritatively to The Web or All Web Frameworks or anything, but we can advise our users on how they can protect themselves.


You are right. The thing is, django doesn't have an image problem so it is fine to announce the attack that way.


I'd argue that Django doesn't have an image problem BECAUSE they announce security vulnerabilities this way.


There was an article on Rails, but no official announcement.

https://news.ycombinator.com/item?id=6150535


Currently, the Django templating tag:

  {% csrf_token %}
...results in an insert like...

  <input type="hidden" name="csrfmiddlewaretoken" 
    value="566e4606b2094c7c48e5d04b58236f51">
I suspect that the particular mitigation strategy the BREACH authors' describe as "Randomizing secrets per request" could be implemented by having {% csrf_token %} instead emit:

  <input type="hidden" name="random_data" 
    value="91178a84e0bc6e08a2fda853eef2d2c8">
  <input type="hidden" name="csrfmiddlewaretoken_xor" 
    value="e0b594e902c7fe6b1748d13aefaf63aa">
...where the random_data changes every response, the emitted csrfmiddlewaretoken_xor is the real token XORed with the random_data, and upon submission the server will again XOR the two values together to get the real CSRF token.

There may be other secrets that need protection in other ways, and maybe this would make any random-source issues more exploitable... but this would seem to protect the CSRF token, in a cheap and minimal way.

UPDATE: Thinking further, though, maybe the attacker can probe for both values at the same time, and thus determine the probability of certain pairs, and thus this only slows the attack? I'd appreciate an expert opinion, as this was the first mitigation that came to mind, and if it's wrong-headed I'd like to bash my intuition into better shape with a clue-hammer.


Your UPDATE is right, attacker can probe a..z 2 times, and just choose the letter that was compressed in both of them, ignoring random compressions


Thanks, but can you clarify... does that mean probing (a..z)×(a..z) (one pair per probe), so there's at least a giant increase in probing required per character? And perhaps even more each character in, since probing for the Nth character now requires (a..z)^(N-1) × (a..z)×(a..z) ?

(I'm guessing also, though, it may be possible to probabilistically probe multiple ranges of the secret at once... in a process that seems vaguely similar to forward-error-correction coding.)


also we need a reflector with 'value="' in the beginning https://twitter.com/homakov/status/364872768921165824/photo/...


Offtopic: this is very simple mitigation for any website, requires JS: https://gist.github.com/homakov/6147227


So to be clear:

1) The attacker must be on the same network as you, or at least be able to detect how large the compressed and encrypted replies are.

If you are on the same network it seems to be there are far more MITM and whatnot attacks that are more likely to succeed, if you do not use HSTS (or secure DNS if that helps).

2) The attacker must be able to get your browser to rapidly generate many (how many?) requests from your browser to the site. It takes "30 seconds" they claim, but is that at a rate 100 requests per second?

3) Each request must carry something that will be reflected by the body of that particular page when it's rendered. I suppose it could be an error message or search string that's echoed.

It seems to me that unless you generate a CSRF token unconditionally on every page, the subset of pages that both reflect something with no protection (e.g. search results) and have a protected form (e.g. change my email address to XYZ) might be small.

4) The secret that can be extracted is what's in the reply body and not the headers -- headers are not compressed, since the TLS compression is now universally disabled post-CRIME.

Personally I use Referer header checking as well. IME all the browsers of my users do send them. So if you extract the CSRF token, it's useless by itself unless you also can make the browser send the right Referer header (and AFAIK, all the holes such as Flash have been plugged).

Other than that -- it seems that if you are normally generating e.g. a 32 byte CSRF key, you could interleave it with 32 bytes of good randomness per request?


>Personally I use Referer header checking as well. IME all the browsers of my users do send them. So if you extract the CSRF token, it's useless by itself unless you also can make the browser send the right Referer header (and AFAIK, all the holes such as Flash have been plugged).

do-not-use-referrer-as-csrf-protection.com


I'm sad that doesn't appear to exist.


The parent said "as well" but I'm still insisting it referrer must not be neither a whole nor a part of CSRF protection.


Sorry, I meant the specific domain you mentioned. :)


This long comment boils down to "so this is an attack that negates the protection that TLS gives you in defending against CSRF attacks". Yes, in the CSRF case, that's what the attack is.


It takes "30 seconds" they claim, but is that at a rate 100 requests per second?

I took the 30 seconds to be the amount of compute time required after they had their samples. Otherwise it is a meaningless number, they could say it takes 0.5 seconds at a rate of 6k requests a second, or 3 thousand years at the rate of 1 request per year.


>Other than that -- it seems that if you are normally generating e.g. a 32 byte CSRF key, you could interleave it with 32 bytes of good randomness per request?

Which would be pretty easily averaged out with a few more requests. It makes the attack a little harder, but not substantially so.


One could argue that when talking about security, it's always about making things harder to breech, not a full proof protection..

I'm not sure how adding 32 bytes of "good randomness" would help.. because the size might be very similar since the randomness might not get properly reduced. And thus, the slight variation in size will still be very relevant.

However, adding between 1 and 32 bytes of randomness might be a pretty good counter! I.e. If you request the page with the "guess letter A", and then request again the same page with the same "guess letter A" and you get +/- 32bytes of different encrypted stuff, it's very hard to assume something was better compressed.

The cool thing about that is that it's fairly trivial to do with most implementation of CSRF. Thoughts?


The attacker just needs to work out the average size of the normal response, then the average size of the response with the extra character. Send enough requests and the difference will be noticeable. It slows them down, but doesn't fix the problem.


Regarding #1 and the other attacks that are more likely to succeed, are you referring to the propensity of users to bypass certificate warnings, or is there something else in play there?


Well, the certificate warnings have gotten pretty grave now, though I could imagine showing users an intermediate page telling them Microsoft has screwed up and you may get certificate warnings... which you should just ignore, trust us.

I was just thinking of hijacking DNS locally (i.e. when browser asks for yourbank.com, send them your IP), and making yourbank.com then redirect to yourbank.myfreehost.com -- which could have a legitimate SSL certificate and a copy of all the branding.

That seems more likely to succeed to me. I guess you could try BREACH if you have some high value target you know is using a very specific website. Anything bitcoin related: either the users themselves or admins. Robbing bitcoins is like robbing banks in the Wild West.


> Personally I use Referer header checking as well. IME all the browsers of my users do send them.

IME that is not true. Certain corporate/government machines have Referer headers turned off for some strange "security" reason. Last time I ran into this when working with Architect of the Capitol (http://www.aoc.gov/), when they couldn't log into a management panel because Django's CSRF protection checks the Referer header.


in this case, the page at DjangoProject needs some updates. disabling gzip/ssl is not the only option here. (it's not even practical in most cases)


Switch off all GZIP..? That feels very extreme, I'm sure there are better workarounds than that one.

EDIT: The following workarounds should be very simple to implement and seem like more viable alternatives for production?

  Length hiding (by adding random amount of bytes 
    to the responses)
  Rate-limiting the requests
Mitigations 6 and 7 taken from http://breachattack.com/


All these framework-vendor guides will recommend switching off Gzip, because it's a content-neutral workaround; it works everywhere, for every instance of the attack, no matter how you've coded your app. There are more specific workarounds, but they require changing how you encode secrets into your page, so there can't really be a vendor guide on how to do that; the vendor doesn't know how and where your app sticks secrets into its views, after all.


I'm sure everyone is going to come up with workarounds that re-enable compression, but they'll be context-dependent and will involve code; in the meantime, the attack is straightforward and viable. Think of disabling compression as a stopgap.


Definitely think about it before just doing it though...

Disabling compression can break some apps. Especially when they rely on huge compression ratios for text (5-10 times ratio is common for with much json for example). So that is not an app agnostic work around. For example, a 100k of json request, can turn into a 1MB json request. The more data required to send, the more chance of error - especially on 3g/2g networks.

For many high end projects, just disabling compression without regard to testing or having an idea of what the application is doing would get you fired or taken to court.

Not only would this break apps, but it would also lose business in that there is evidence from Amazon and others that every 100ms extra latency can cost 1% in sales.

From SPDY whitepaper: "45 - 1142 ms in page load time simply due to header compression". Remember that headers use the upload part of the link... which means too many headers and you can saturate the upload, therefore making the whole internet connection stall for everyone using it. Common upload limits are only 5-10K/second, so excessive headers combined with many requests can easily DOS many internet connections.

I spend a lot of time optimising websites for these reasons, and disabling compression could add 20 seconds of load time for a good percentage of users.

So, for many apps, turning off compression is no solution at all. You might as well just disconnect your app from the internet - that will also give you a secure and broken app.

A proper risk, and impact analysis should be done first. Too often quick hot fixes to security issues just break things or even make things less secure.


Yeah turning off compression completely is kind of crude but it works without having to go into app-specifics, I'm sure the Django folks are on it ;). When it comes to things like CSRF tokens, probably secret masking (4) is the easiest to implement? so something like

  import uuid
  csrf = '42be455e20e64d7294eee8d1806d14a9'
  p = uuid.uuid4().hex # random response-specific pad
  xord = "%2x" % (int(csrf, 16) ^ int(p,16))
  request_token = "%s%s" % (p, xord)
  print "<input type='hidden' name='token' value='%s'>" % request_token
  v_unxord = "%2x" % (int(request_token[len(request_token)/2:], 16) ^ int(request_token[:len(request_token)/2], 16))
  if ( v_unxord == csrf ): print "yay, valid CSRF" # constant_time_cmp


Length hiding was shown to be ineffective in the article (by adding random noise). Perhaps a fixed length response would work better- or perhaps one that is heavily quantized? Really, production environments are not the place to try un-vetted academic crypto research.


> Really, production environments are not the place to try un-vetted academic crypto research.

A very salient piece of cautionary advice. Disable gzip to protect prod. Figure out what to do to allow compression off prod, and engage the devs of your stack/framework to do this correctly.


You can still safely compress your static files. So, assuming that you don't send any secrets in your CSS, JS etc., you can configure your server to enable gzip only for these resources.

For example, when using nginx and with gzip off globally, you can do :

    location /static/ {
        gzip on;
        ...
    }


Would this really work? Won't they be sharing cookies unless the static files are on a different domain?


Doesn't matter if they share cookies. Static files reflect neither user-supplied data nor secrets in their contents, so they can't be used in a BREACH attack.


Additionally, you should be serving static files from a different domain anyway so that you can get the performance gains from cookie-less requests.


Can you point out some more info/sources on these performance gains?



Would it be possible to just have some sort of dynamic compression scheme and not gzip when you'll potentially be transmitting sensitive information?


This. Suppose we could randomize Gzip levels differently in each response.


Randomizing Gzip levels (and random padding) only increases the number of samples you need to take.


We can also create one-time pads for tokens, so the number of samples will ramp up pretty so quickly which will make these kind of attack unfeasible.

And I guess we can tweak gzip Huffman tables so user inputs were poorly compressed compared to rest of the page content.


adding random amount of bytes to response may break your caching, in a variety of ways.


unlikely to affect you if you're already embedding CSRF tokens in your responses, which would defeat caching anyway. Curious if this response length fiddling will mitigate the attack, can anyone more knowledgeable than me confirm?


interesting point that CSRF tokens break caching. People can store it in cookie, and not put into the actual HTML.


A few days ago, Meldium's announcement of a Ruby gem that provides an inexpensive partial protection (i.e. not disabling gzip) made it to the HN front page:

http://blog.meldium.com/home/2013/8/2/running-rails-defend-y...

The two protective measures are masking the Rails CSRF token and appending a HTML comment to every HTML doc to slow down plaintext recovery. How easy is this to include in a Django plugin?


Is a partial workaround really better than a guaranteed workaround.


This would cause a big problem for us. we mobile web service serves around 3-4k concurrent requests on average. without compression our API would take 300% - 900% increase in the delay.

is there any alternatives ? would like to know what Cloud Flare would do as their CDN is based on compressed nginx responses.


As gzip compression only applies to the content of the page, not the headers, I would assume that prefixing your page with content that is variably compressible and of varying lengths would throw a monkey wrench in the attacks.

The compressed content of any part of a page very much depends on what came before it. Altering the content to include a script comment block full of random text and various common HTML and JavaScript elements (Markov chains anyone?) would definitely change how a page is compressed.

If the compressed length of the replies varies significantly with every request - even if the request content is identical - attacks like this can no longer reveal hidden information.

Edit:

You could improve this significantly by including false positive matches as well. If your HTML content has: csrf="45a7..." in it, you could hash that content into enough material to generate 19 or so identical looking code blocks embedded in a script comment. You've now provided a 95% chance they attack the wrong one / increased the number of attacks they'll need to try by 20x.

This method (minus the above part) would actually be cacheable by smart CDNs like Cloudflare.


Random padding can be averaged out. It increases the work factor of the attack, but not by much.


The attacker has to be able to issue requests on behalf of the user with injected "canary" strings. I fail to see a practical exploit where one can do this and wouldn't have access to the secret in the response anyway. What am I missing?


Does any GET or POST URI endpoint in your application accept parameters? Do none of those parameters impact the output of the application? That set of circumstances is extraordinarily common.


The request has to be issued by the attacker from the victim's browser. If the attacker can do that, why is he unable to read the response to that request?

Edit: I think I can see a scenario where a third-party website does these requests via an <iframe> or an <img>. I'm not sure there's a way to do POST quite as easily.


Do you understand how CSRF works? Just think of it in terms of CSRF. Since the attacker is trying to infer page content, they don't care that the server rejects all the probing requests, so CSRF protection doesn't help you as the attacker carries out the BREACH/CRIME stuff. If the result of the attack is an inferred CSRF token, they then cap the whole exploit off with a (now working) actual CSRF attack.


I understand how the attack works, the question was about how a practical exploit would actually be carried out. I've figured out how one would issue GET requests from the right environment, but I don't know if the same is possible for POST.


It is just as possible. POST csrf exploits add between two and three minutes to an attacker to craft the request differently.


Just in case you weren't clear on this already: CSRF works just fine against POST endpoints. Think Javascript.


Would it be possible to thwart this attack (BREACH) by issuing fresh CSRF tokens for each requests?


Yes.


Oracle padding


I'm in the same boat - if the attacker could inject strings into requests pre-compression, then wouldn't the client already be compromised?


No, you're missing that the original GET requests can be performed in some cases over HTTP, either by forgery or by surreptitiously spoofing the user's own browser into doing it. No need to have compromised the SSL/TLS.


How about having the CSRF token change with each request? If it's encrypted/signed by the server for each request with a random IV then it would be different in each request. It would be a bit more processing on the server (decrypt vs just HMAC verify) but it would be completely different each time. It seems kind of belt and suspenders as you're encrypting data within an encrypted channel but I think it gets around this issue.


  >  We offer several tactics for mitigating the attack. 
  > 
  > * Randomizing secrets per request 
http://breachattack.com/#mitigations


If the CSRF token changes with each page view then opening a second page (perhaps an explanation for a form field) in a new tab/window would invalidate the form in the original tab/window.


Not necessarily. The token can be used to simply verify that the request came from a legit page and not cross site request. The encrypted CSRF need only be verified by the server to see if it's not expired. The server can store the expiration in the CSRF token itself (encrypted and signed). It does not need to maintain a list of the CSRF tokens.

I wrote about this a little while back. Comments are here: https://news.ycombinator.com/item?id=5971464


Wouldn't the lifetime of the token have to be <30 seconds, according to the claims made in the paper?


I don't think so. Here's the snippet from the linked PDF[1]:

> DEFLATE [2] (the basis for gzip) takes advantage of repeated strings to shrink the compressed payload, an attacker can use the the reflected URL parameter to guess the secret one character at a time.

By encrypting the CSRF token (or any other "secret" data you want to roundtrip from server to client and back) with a random IV per request this wouldn't work. The value sent by the client would not be the same as the new token generated by the server (since each has a random IV). Even though the decrypted value of each token is the same, the values presented to the client in the response body are each different and not predictable (to the client).

[1]: http://breachattack.com/resources/BREACH%20-%20SSL,%20gone%2...


Maybe only have it generated for views of the page with the form on it?


Has anyone looked at mitigating the attack by changing the behavior of chunked transfer encoding?

Chunked Transfer encoding is basically padding that a server can easily control, without having to change content or behavior of a backend application. A web server could easily insert an order of magnitude more chunks, and randomly place them in the response stream.


I'm not sure I fully understand the proposed fix here, how does it differ from the application simply including random chunks of data inside the response?

This area of things isn't my strong suite, but assuming that this is analogous to just adding random data to the response, I believe that simply adding random data to the response can be worked around by doing more requests as using statistics to factor out the noise introduced.

If my understanding is wrong then excuse me :)


Yes, it functionally the same.

The difference is that it is extremely easy to add at an http proxy or load balancer level, and could potentially turn 30 seconds into hours;

I would love a way to figure out the math on how many bytes of random response length changes the number of requests needed?


I have seen random workarounds at the app level as well, where the app adds a random length HTML comment on the end of the page.

But if random can be statistically removed, then they shouldn't add a random amount. Maybe just track the max size of the returned response and always add enough to reach that max size. Therefore the lengths of all the pages will always be the same. This is still better than turning off compression completely. A typical max for a detail page in an app might just be the size of the page plus 256 bytes per app output field.


You can basically add as many or as few bytes as you like, by abusing the chunk-extension in chunked encoding:

http://tools.ietf.org/html/rfc2616#section-3.6.1

So you could make all http responses round into 128 byte chunks, by appending 1 to 128 bytes at the end of every response.

Effectively it gives you length hiding at an http layer; Still attackable.


So, it seems that even if I encrypt everything, a lot of information is still present in the size of encrypted message; in case of VOIP, it's possible to guess speech that is being transferred over an encrypted transport, in the case of text, it's possible to figure out secrets if the attacker can modify an equally-sized part of the message.

Is there any general way of preventing this kind of attacks? Inserting random data could work, but it's distribution would have to be exactly right for the attack to be impossible over longer periods of time. For the BREACH case, we could solve it by not compressing user input, but what about the VOIP case?

Also, why does the site http://breachattack.com/ says that "Randomizing secrets per request" is less effective than disabling compression?


> Is there any general way of preventing this kind of attacks?

Disabling compression is a 100%-effective countermeasure for compression oracle attacks.

> Also, why does the site http://breachattack.com/ says that "Randomizing secrets per request" is less effective than disabling compression?

Putting random data in the server response will only slow down the attack. With enough requests, the noise from that random data will wash out.

Disabling compression will stop the attack cold. The whole thing is predicated on analyzing the size of the compressed text. No compression, no compression oracle.


How is it that random data would only slow it down? If I add a random field to my response of variable length, say, from 0 to 50, with completely random characters, it should completely throw off this attack. The length of the output will change from request to request.

I suppose, given infinite time, you could send the same request over & over and map the variance of content lengths, and get an idea of what the actual content length was before random padding? But the compression seems to throw that off even more AFAIK - because the data we pad with is random, it could very well accidentally compress well because of the rest of the data in the response, further throwing off any guesses.

Edit: From the pdf on breachattack.com:

  While this measure does make the attack take longer, it does so only slightly.
  The countermeasure requires the attacker to issue more requests, and measure the
  sizes of more responses, but not enough to make the attack infeasible. By repeating
  requests and averaging the sizes of the corresponding responses, the attacker can
  quickly learn the true length of the cipher text. This essentially boils down to the
  fact that the standard error of the mean in this case is inversely proportional to p
  N, where N is the number of repeat requests the attacker makes for each guess.


I think it is still discoverable because adding random data leads to the following two distributions:

Attacker gets secret wrong, page is size: original page + (zero to fifty) + length of incorrect secret Attacker gets secret right, page is size: original page + (zero to fifty)

With a sufficiently high number of observations the attack with the right secret has a mean value that is lower than the attack with the incorrect secret.

At least that's what it appears to me. I could be wrong.


> Putting random data in the server response will only slow down the attack.

Yes, that's what I mentioned in my post. However, the way I understand "Randomizing secrets per request", it means sending new, random secrets with every request (i.e. generating a new CSRF token for every request).


Changing the CSRF token with each request would work, but you also risk frustrating your users this way. If you have more than one tab open on the same web application, you could only submit a form successfully from the "freshest" of these.


Well, technically you don't need to invalidate the earlier CSRF tokens. Saving a big number of CSRF tokens per user would of course require quite a lot of storage, but maybe you could devise some "clever" scheme, e.g. token = "n" + sha(user_secret + "n"), which would be random enough for preventing BREACH, but easy enough to check.


You'd want to have the secret last in the hash, otherwise you're open to hash extension attacks in some rare cases.


I'm sure it will come, but I'd appreciate a layman's terms explanation of this. What is the threat, and how do you go about fixing things in Django?


Imagine you're going to send a compressed and encrypted message to a friend, and I (the attacker) can do two things:

1) Append a bit to the message before it is compressed and encrypted. 2) See the size of the final message.

So I start by appending the string "4179174b19e0cdc91bf4" to your plaintext message. I see the final encrypted message size is 500 bytes.

Then, I redo the experiment, but this time, I append the string "cschmidt@example.com" to the message. The final encrypted message size is now 480 bytes. The string I injected was the same size, but the compression worked better this time, and I can guess it's because the string I picked is redundant with something in your plaintext.

Mix in a bunch of complicated math and a bit of javascript, and you've got an exploit.

This threat isn't specific to Django: it's being billed as a TLS attack, but any encryption system that uses compression the same way is vulnerable.


Here's a simple explanation: http://arstechnica.com/security/2013/08/gone-in-30-seconds-n...

It's not Django, but us over at Rails have been discussing various parts of BREACH and how we'll handle it: https://github.com/rails/rails/pull/11729

The important two comments are here: https://github.com/rails/rails/pull/11729/#issuecomment-2206... and https://github.com/rails/rails/pull/11729/#issuecomment-2208...

  > Let's let this stew for a while with security researchers doing their
  > analysis on various approaches and wait and see what the security community
  > as a whole recommends.
  >
  >  My only concern to rushing out a release is that we do something equally 
  >  dumb and end up creating a different problem for our users. 
  > 
  > We can roll out fixes as it becomes clear what the consensus is as to the
  > best solution for a generalised framework like Rails.
As http://breachattack.com/ says:

  * Be served from a server that uses HTTP-level compression
  * Reflect user-input in HTTP response bodies
  * Reflect a secret (such as a CSRF token) in HTTP response bodies
These things are easy to tell about your application, but are much harder for frameworks to detect generally, which is why projects like Django and Rails will take some time to evaluate exactly how to best handle this at the framework level.


I'm not all that familiar with BREACH, so please correct me on the parts I'm wrong about, but it seems that it's an attack that allows one to recover some data sent over TLS if compression on TLS and the protocol level is enabled.

In Django, this means that attackers could recover the CSRF token that's used to prevent cross site requests. This means anyone between you and a client could later have that client automatically make authenticated requests to your app, simply by visiting a site they control, without the knowledge of the user.

To protect yourself, the Django team recommends turning off compression either at the TLS level and at the HTTP level.


The only part you're wrong about is the compression on TLS part, it's not an 'and':

  > While CRIME was mitigated by disabling TLS/SPDY compression (and by modifying
  > gzip to allow for explicit separation of compression contexts in SPDY),
  > BREACH attacks HTTP responses. These are compressed using the common HTTP
  > compression, which is much more common than TLS-level compression. This
  > allows essentially the same attack demonstrated by Duong and Rizzo, but
  > without relying on TLS-level compression (as they anticipated).
and

  > It is important to note that the attack is agnostic to the version of
  > TLS/SSL, and _does not require TLS-layer compression._
breachattack.com


The link is like 5 sentences long and 2 of them are recommendations for stopping the attack.


Just because it's short doesn't mean it's simple to understand. Also, the recommendations aren't about how to fix things in Django, but instead a way to prevent this attack from happening. Hardly a long-term solution, especially because it basically asks you to avoid using GZIP for compression, which many people and organizations rely on in order to process timely responses.


This advisory is pretty understandable, I think.

The big bold text ("BREACH may be used to compromise Django's CSRF protection") is a strong warning of the threat (becoming vulnerable to XSS). They list two steps that they recommend taking; disable the gzip middleware in your settings.py, and disable gzip for responses from your web server.


This attack works very much like the game Mastermind. http://en.wikipedia.org/wiki/Mastermind_(board_game)


That's true for almost all side channel attacks using user controlled input (eg. padding oracle attack).


Could somebody help me understand how this attack would be viable?

It seems like the attack has the following requirements:

  1. You want a secret that appears in the response body, like a 
     CSRF token.

  2. The web server always responds with the exact same response 
     for a request.

  3. The response body contains data that you send to the server, 
     e.g. url params.

  4. The attacker has access to an environment where he can send requests 
     under your browser session (otherwise, the user would be
     unauthenticated and there would be no secrets to steal).
Given (4.), how is this a real concern? If I, an attacker, am able to make 3000+ requests while logged in under your session and modify the request character by character pre-encryption, doesn't it logically follow that I have your cookies anyway?


The #4 is not that difficult without compromising the user's browser -- as long as the user can visit the site under your command or see some HTML under your commend you can make the browser do a HTTPS request to anywhere at all.

Maybe you buy some targetted ads served in an iframe. Maybe you send the user an email where his email server either always shows images, or you trick the user in clicking 'display images' with promise of kittens.

You won't be able to see the results directly, but if you can observe how long the encrypted responses will be, you'll know whether your reflected input could make use of the compression dictionary (meaning your reflected input matches the secret) or not.

I wonder if there is any way to even do this without the passive network snooping -- like some kind of internal browser stats API call that tells you # of HTTPS bytes transferred. It could be innocent enough so it's not protected.


No, it does not logically follow. The attack means that someone who can (for instance) poison the DNS can (for instance) evade CSRF protection for the domains they've poisoned.


Looking at https://github.com/django/django/blob/ffcf24c9ce781a7c194ed8... I'm a little confused about how the csrf-token is generally used in Django -- but if I understand the code correctly, it looks for a cookie with the csrf_token, and compares that to a POSTed value (or x-header in case of an Ajax request).

If the system has a decent random-implementation there is no secret involved, just a (pseudo)random string -- essentially a csrf cookie is given the client on one request, and compared on the next request(s).

Is there any reason one couldn't simply use the rotate_token()-function on every (n) request(s)?


Just to make sure I understand this correctly: is this only a security issue if you include sensitive information on a page by default?

For instance, if you had a search field, the contents of what users puts in that search field will not be compromised. However, if you include a csrf token with the search field form, that can be compromised since it will be there every time the attacker gets the victim to make a request.


I've knocked up a package that provides CSRF token masking and length modification that may help mitigate this. If anyone wants to vet it and submit pull requests, you're more than welcome. https://github.com/lpomfrey/django-debreach


im a rabbit


To your second point, security@djangoproject.com. It's documented in a bunch of places; where'd you look for it? I'll add it there, too :)

To the first point, we believe that Django's CSRF protection is as strong as session-linked CSRF protection, and adds CSRF protection to anonymous users (users without a session as well). In other words, it's a design decision, one that we believe doesn't compromise CSRF protection. If you believe otherwise, please get in touch (see above).


1) session and logged in/out user are two different things. Session is the way you store information about current user, no matter he is anonymouse or logged in.

2) I checked again for instance https://bitbucket.org/ - edit csrftoken cookie to any, 123123 for example. Reload the page and if site keeps working - Cookie Forcing with MITM will do the same thing using http: injector Set-Cookie

not only MITM, subdomains can do precisely same thing. Either bitbucked uses old django or django is vulnerable to it (which is, well, a severe vulnerability imo)


1) I know the difference between sessions and logging in. I didn't say anything about logging in; I said that our CSRF protection protects users without sessions. Not all sites use sessions (some for performance reasons, others for privacy reasons); must those sites be vulnerable to CSRF?

2) First, you should report this to Bitbucket: https://www.atlassian.com/security. And c'mon, disclosing a possible CSRF vulnerability on a public board is kinda irresponsible. Is responsible disclosure not something you practice?

SecondI don't know what Bitbucket is running, exactly, and exrapolating from Bitbucket to Django is pretty lazy. Frameworks != sites. Once again, we've spent quite of bit of time validating the design and implementation of Django's CSRF protection, and we believe it works. If you find proof otherwise, can you please send it to security@djangoproject.com, and not post it to Hacker News?


1) only to make sure we are on the same page. Now I see - we have different understanding of "session".

>Not all sites use sessions (some for performance reasons, others for privacy reasons);

what kind of site doesn't use sessions? To track a user you need a cookie right?

2) Frameworks != sites. As I used to think, only framework is responsible for CSRF protection, hence I extrapolated. I sent it to security@ as soon as I found this email. I am trying to not proclaim anything but some websites from http://www.djangosites.org/ are vulnerable.


Could you clarify what actually can be done by this? From what I can see, you can't do XSS because it's escaped.

I guess there's the chance that you could do CSRF because you've essentially "set" their CSRF token?


XSS? unrelated here I think.

Exactly, cookie forcing/tossing = "set" their CSRF token


Ok, plausible attack I thought up based on what I could suck up from homakov's and various other posts:

1. User is browsing an HTTPS Django site and a HTTP site on WiFi.

2. Hacker is MITMing a connection (on WiFi), cannot decrypt SSL.

3. Changes user's CSRF token for Django via cookie-forcing.

4. Hacker can now use user's other HTTP session, and inject JS (or whatever) so their browser sends stuff to the HTTPS Django site, fully knowing their CSRF token (because we set it) and thus can forge requests easily.

Does Django mitigate something like that already? I think should be pretty easy to mitigate by using Django's signing to sign the session ID or something into the CSRF token?


An unrelated HTTP session cannot set a cookie for another domain (unless it's a subdomain in which you have the more serious issue of session theft or session fixation). The solution to both of these problems is HSTS with the includeSubDomains option.


1) simulate request to http(NOT SECURE)://site.com 2) simulate response with Set-Cookie: csrf_token=123123;

> unless it's a subdomain in which you have the more serious issue of session theft or session fixation

elaborate plz? From what I know, subdomain can only do same attack (cookie tossing). Are you talking about phishing?


Except we can set the cookie because we're already doing that to HTTPS requests via cookie-forcing.


I've just woken up and I haven't tested it, but off the cuff I believe HSTS will prevent the browser from trusting a plaintext HTTP response at all. So you cannot force a cookie if I understand the blog post correctly. You'd have to create a cookie inside of a verifiable HTTPS connection, which if you can do you've already executed a much worse attack.


> but off the cuff I believe HSTS will prevent the browser from trusting a plaintext HTTP response at all

Cookies are broken (i write about it on my blog like, daily). The essential idea of Forcing is injecting cookies into HTTPS space from HTTP.


I think this is what it hinges on:

http://scarybeastsecurity.blogspot.com.es/2008/11/cookie-for...

Is that possible under HSTS?


@homakov, so you can get a browser to trust a cookie from a HTTP response under a domain protected by HSTS?


Doesn't HSTS just mean that the browser will just always make HTTPS responses?


>And i was trying to report it, but didn't find a contact/email.

How hard did you try? Django uses the industry standard security@ address for reporting security issues.

A quick googling results in this page pretty easily: https://docs.djangoproject.com/en/1.5/internals/security/

EDIT: I described the link as 'first' in the Google results, but that was because Google was being helpful and promoting a page I've visited a lot before... In reality, it's a few links down.


1) open https://www.djangoproject.com/ 2) find /contact/, /email/, user group


I've never really looked at Django's site before, picked 'community' on the upper right, and it says

  > Report potential security issues in Django via private email to 
  > security@djangoproject.com, and not via Django's Trac instance or the 
  > django-developers mailing list


The community page (where all the mailing lists and contact addresses are listed) say:

>Report potential security issues in Django via private email to security@djangoproject.com, and not via Django's Trac instance or the django-developers mailing list


Django's CSRF protection is perfectly fine other than issues with BREACH.

In any case that you can edit the CSRF token you already can execute a much stronger attack (MITM, XSS, etc). If you have a way to set your own arbitrary cookie that doesn't require a much work attack that already includes the ability to do arbitrary requests without them needing to be cross origin then I heartily suggest you report it.


> In any case that you can edit the CSRF token you already can execute a much stronger attack (MITM, XSS, etc).

MITM is not a problem for HTTPS, but it can force cookies. Subdomain can do same thing. These preconditions are common


A browser will accept cookies from a HTTP response on a site that has HSTS set?

A CSRF request from a plaintext subdomain would not include a header and would fail Django's CSRF for lacking a referrer header (Strict referrer checking only available under HTTPS). Further more even a subdomain under TLS would fail to have the same origin.

A subdomain can set cookies this is true. This requires a XSS on the subdomain or allowing plaintext responses on subdomains. If you do not ensure both of these then besides being able to set (or in the case of XSS, read) the CSRF token you can also fixate the session, steal the session, preform a DoS using the size of the cookie, etc.

The solution is forced TLS with HSTS and includeSubdomains.


hm, how can XSS on subdomain: "steal the session, preform a DoS using the size of the cookie"


>The solution is forced TLS with HSTS and includeSubdomains.

that's true. Few questions: 1) does django provide HSTS header by default? didn't find it in codebase 2) does django has includeSubdomains? bitbucket doesn't have it 3) it is still vulnerable to subdomain-XSS tossing.

taking into account all possible ways to replace the token I think it's django's duty to couple it with session (whatever it is in django) by default. What do you think, should it stay semi manual?


1) Not currently (and it's unlikely to be able to do so) there's an addon to add it which has been talked about getting it into core. I do believe it's documented to use HSTS and TLS if you want your site secured and recommends the ddon.

2) The above addon does have it. I have no idea what bitbucket does.

3) Not sure I understand what this means, you mean if there is a valid XSS on a subdomain?

Decoupling CSRF and Sessions has been a requirement for us for awhile now. We've spoken a little bit about optionally coupling them where if you have the session framework enabled it will couple them but if you don't it falls back on the current method.


P.S. i am not into django, but if you have a clue how to contact authors... please tell them to put CSRF token into session cookie. It must be fixed in the first place, BREACH is 100 times harder and longer, while cookie forcing is completely viable attack with active MITM. Or perhaps it was fixed? I checked it on bitbucket the last time..


Again, we believe that sessions and CSRF protection can be orthangonal (and that there are benefits to doing so). If you can prove otherwise, let us know!

There's also https://github.com/mozilla/django-session-csrf, an alternate CSRF implementation by Mozilla that does use session-linked CSRF tokens. So if you insist on "tokens must be session-linked", you can use that instead.


sorry, I think in terms of Rails, in rails a session is a _site_sess cookie... i am not sure how it works in Django but what here is a post about it http://homakov.blogspot.com/2013/06/cookie-forcing-protectio...

https://github.com/mozilla/django-session-csrf seems ok, should be default


"i am not sure how it works in Django"

Perhaps you should do a little research before proclaiming things insecure?


bitbucket is vulnerable > django has a problem

if it's not enough:

some websites from http://www.djangosites.org/ are vulnerable > django has a problem


and yes, they clearly state it was made as a solution to cookie forcing:

>Your site is on a subdomain with other sites that are not under your control, so cookies could come from anywhere.

it should be default, for sure.


Rails is not vulnerable to cookie forcing btw. But your authentication can be (Devise bug http://blog.plataformatec.com.br/2013/08/csrf-token-fixation... )


Disable compression altogether? That's craptastic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: