Hacker News new | past | comments | ask | show | jobs | submit login
The Trouble with Tor (cloudflare.com)
323 points by jgrahamc on March 30, 2016 | hide | past | favorite | 172 comments



"To make sure our team understood what a pain CAPTCHAs could be, I blacklisted all the IP addresses used in CloudFlare's office so our employees would need to pass a CAPTCHA every time they wanted to visit any of our customers' sites."

Say what you will about cloudflare, that's an impressive move.


Impressive, yes, but I'm going to hazard a guess that they didn't route all of that traffic through Tor to experience the CAPTCHA with the bandwidth constraints imposed by using an exit node.

The CAPTCHAs are (mostly) easy to solve, but all of the ones I was presented with were "pick the right one out of 9 different images" and loading the entire CAPTCHA in Tor Browser took several seconds (and many revealed a new image after clicking one of the 9). This is then repeated at least once (I received three on one site, I'm guessing because I didn't know if the picture was a store front or just the front of some building). After completing the challenge I was given a connection error and had to repeat the entire thing again in one case.

There are much lower bandwidth CAPTCHAs out there and those should be favored over these large image-based ones for connections originating from the block of addresses represented by Tor exit nodes.


The entire CloudFlare recaptcha page is < 100 KB, and that's including the images. If you're annoyed by the time it takes for that page to load, you won't be happy browsing any site via Tor - even Google is > 200 KB.


You're right, they're not really that big. They load extraordinarily slowly. Each load one at a time from upper left to lower right and you can watch them download -- they don't just "appear", they show up like they would on a slow dial-up connection.

Once you get onto the site, it loads more slowly than in a non-Tor connection, but a news site I hit loaded everything at about the same speed as the little reCAPTCHA form, so I'm left wondering if it's something related to reCAPTCHA.


Probably just bad luck with your circuits and their latency to the image host. Tested this with a small number of circuits and didn't notice anything loading significantly faster or slower compared to other sites.


So your excuse to doubling the time it takes to get to a website is that someone's connection is already slow? People don't use tor because it's fast but that's no reason to punish them even more.


You're putting words in my mouth. I merely pointed out that the reCAPTCHA page is not in any way a bandwidth hog. Whether you think captchas are an appropriate tool to filter out abuse from Tor users is a different discussion altogether, my point is that if you're going to have any kind of captcha, then the one CloudFlare is using is probably smaller in size than most other pages you might visit.


It's not so much a bandwidth hog as a request hog.


this is a crucial point, new domains mean setting up new circuits, which is the real delay here.

Its like the http blocking load RTT problem magnified by 10.


AFAIK TBB doesn't create per-domain circuits for every subresource on a site (that would kill performance on many sites with 10s of third-party trackers, CDN hosts, etc.), but rather one circuit per "URL-bar domain". That domain doesn't change when a CloudFlare site renders reCAPTCHA.


But it is a bandwidth hog because it's purely overhead (in terms of bandwidth and latency) added to every page behind cloudflare that a TBB user visits.


How much of each is cached?


Both requested with a clean cache using Tor Browser Bundle. Not sure if TBB does any caching at all anyway, seems like that would make fingerprinting easier, but I haven't checked. FWIW, the actual captcha image is ~20 KB.


Eugh, I hate the image based CAPTCHAs, they take way more mental power than the old text based ones did and often take me 2 or 3 tries to get right. Is there a way to permanently opt back to the text based ones?


I'm guessing you aren't talking about the ones displayed to Tor users before. The text-based ones presented to Tor users with Javascript disabled were almost impossible to answer correctly. The CAPTCHAs still irritate me, but the image-based ones are far superior to what I was being given previously.


Glad that I'm not the only one who dislikes the new style. They do have one advantage - they seem to be solvable 100% of the time (at least for now). In every other aspect though I find them to be way more annoying than the old ones. They're slower to load, take longer to solve, require much more concentration, and have extremely variable difficulty. The worst are the ones which replace each image you click with a new one, requiring another 5-10 seconds for loading.


I've gotten one or two wrong before. Sometimes the questions aren't well defined for a given picture.

For example, the question was "Does this have a river in it?" with a picture of the Grand Canyon, where you couldn't quite see down to the Colorado River.


I find them easier (and thus prefer them) than the gibberish "words" CAPTCHAs Google uses that deem me a machine time and time again.


disabling javascript goes back to the older "squiggly words" style in my experience. I have no idea if that happens with cloudflare though.


Unfortunately, it doesn't ... I wish it did.

You now get a page with images that have checkboxes next to them. When you submit the form, you get a Base64 key to paste into a text box.


Hmm, wonder if they're doing this to try to get around captcha solver services... this seems easy enough to construct into a single image with numbers that you could tell someone to enter via a solver service though.


What he didn't say was that he did it without warning and it lasted 30 days.

Imagine what that was like for the technical support and customer success teams who were helping customers with their sites.


Imagine what that was like for the technical support and customer success teams who were helping customers with their sites.

It sucks. But that's exactly the point. Using the internet is a mission critical thing for many people and for some that means using Tor Browser or similar to get around oppressive governments. This sounds like a really effective way to make sure the things you're doing are impacting your customers minimally. I bet you guys screamed loudly and were heard more clearly than a random customer trying to use Tor day-to-day.


I am a strong believer in dogfooding. If it wasn't a pain you felt, it wouldn't be prioritized to be fixed.


Dogfooding is great, if there actually is a fix to the issue at hand. If the issue is "anonymous internet access gets abused a lot" then I'm not sure what a tech support guy is meant to usefully contribute to ending the dogfooding pain.


> Dogfooding is great, if there actually is a fix to the issue at hand. If the issue is "anonymous internet access gets abused a lot" then I'm not sure what a tech support guy is meant to usefully contribute to ending the dogfooding pain.

The problem being solved was not "anonymous internet access gets abused a lot", it was "the mechanism we use to combat abuse is too aggressive for some segments of our users and thereby denying service to actual humans, and these challenges are invisibile to our engineers and employees because they don't browse on connections that trigger the system".

As the blog post shows (and this is backed up by my experience as a user of Tor) CloudFlare significantly improved its handling of this in response. So I would say this is a success. I am sure that pressure from employees, including support guys, caused this long-running issue to finally get the attention it deserved.


Exactly.


Others have already mentioned how Tor bandwidth/latency issues could make things substantially worse for 'real' Tor users outside of their simulation.

But the bigger issue I would wonder about is Google's reputation systems. Google does not treat all CAPTCHA requests equally.

An office full of CloudFlare employees peaceably going about their daily browsing is going to get a much easier CAPTCHA situation than a Tor IP containing a mix of automated nefarious activity and individuals peaceably browsing from infomatically repressed countries.


Sounds like the "old wise sage master teaching grasshopper employees a lesson" pattern.

But has he such low opinions of his employee that he couldn't have simply told them "captchas are bad UX, try it out if you don't believe me"?

it just seems condescending to me


I didn't see it like that at all...

It was more of a "let's really feel the pain from this and try to come up with a better solution". And at the end of it they didn't. They felt the pain of having to do it for just about everything, and still couldn't come up with a better solution other than "improve the captcha system".

And from there they gave some suggestions on how they could work with tor to give a better experience.

Simply saying "captchas suck, find a better way" would have gotten nowhere because the people involved wouldn't personally know the pain points, they wouldn't know the exact issues, or what exactly about it was the most annoying. (was it the captcha itself, or the frequency you had to fill it out? Did it completely break some of their workflows, and what were the ways they ended up getting around those breakages, and can they prevent them? Perhaps they found the redirect page broke some browsers/software/extensions!)

I would kill to have a one-on-one with a typical "user" of my products. A long sit down meeting to understand exactly what they want/need above "i want it to work better". Making your developers/staff users of your product is a great way to do that and to really internalize it.


What's the rationale for this weak crypto in Tor?

Tor uses hashes generated with the weak SHA-1 algorithm to generate .onion addresses and then only uses 80 bits of the 160 bits from the hash to generate the address, making them even weaker.

Other weaknesses include use of RSA-1024, people have been complaining to no avail since at least 2013: http://arstechnica.com/security/2013/09/majority-of-tor-cryp...


This isn't actually a vulnerability yet, as SHA-1 is only known to be vulnerable to collision attacks (where you try to find two messages with the same hash) rather than pre-image attacks (where you try to find a message with a specific hash); almost no hash functions have ever been found to be vulnerable to pre-image attacks: https://github.com/zooko/hash-function-survey/blob/master/pr... Secondly, the issue is moot because generating collisions on 80 bits takes just 2^40 work - not hard at all.

Basically what that means for Tor, is that while it'd be pretty easy for a Onion site operator to generate two keys corresponding to the same .onion address, an _attacker_ still has to do 2^80 work to attack a site by generating a key with the same Onion address. While that's not great - 2^128 work is considered "standard" in cryptographic work - 2^80 work is still hard enough that there are probably cheaper ways of attacking Onion sites (for reference, the cumulative total work done by all Bitcoin network miners in the entire history of Bitcoin is about 2^80 hashes).

As for the 1024bit pubkeys, I'm not sure what the status of that is; from what I hear Tor is actively working towards a Onion redesign that will fix these issues, and longer pubkeys may have already been fixed.


Tor 224 proposes moving hidden service addresses to full ed25519 keys rather than truncated hashes of RSA keys amongst other things.

https://gitweb.torproject.org/torspec.git/tree/proposals/224...


nitpicking but:

> This isn't actually a vulnerability yet, as SHA-1 is only known to be vulnerable to collision attacks

should be:

* this isn't a vulnerability (there are no reason to believe that we might see it vulnerable to pre-image attacks one day)

* SHA-1 is thought to become vulnerable to collision attacks

> total work done by all Bitcoin network miners in the entire history of Bitcoin is about 2^80 hashes

without thinking of hashes, current cycles done per second by the bitcoin network is around 2^90


SHA-1 is thought to become vulnerable to collision attacks

SHA-1 is actually known to be, not just thought to become, vulnerable to collision attacks at less than the full bit strength of the hash[0].

The important part is the pre-image resistance, of which there is no known attack.

https://en.wikipedia.org/wiki/SHA-1#Attacks


Right, it's more than certain now that with time we will be able to find a collision ("an estimated cost of $2.77M to break a single hash value by renting CPU power from cloud servers").

But there is a difference between the theory and actually finding a collision. And then a huge difference as well on how to exploit that.


Legacy code. Just like the I2P team, they're currently working on modernizing their crypto.


I have never successfully completed a captcha served up by cloudflare (and thus Google) on Tor. They are fiendishly difficult to the point I suspected the mechanism is broken.


I use the audio captcha.

Works every time.

The ones that are impossible for my are the street signs, and after that all of the cultural ones (I see US captchas and when asked to select a sandwich or a recreational vehicle I'm doomed to not complete them).

An interesting side effect of failing a captcha is that to Google this looks like proof the captcha is working, that you're likely to be a bot, and that they should definitely give you the hard captchas.

As such, if you cannot complete a captcha the chances increase that you must now complete multiple difficult captchas.

The audio captcha is delightfully simpler though it does take a moment longer to complete.


Doesn't the audio captcha work only if you have javascript enabled? If you enable javascript on TOR then you are doing it wrong.

I could be wrong though


> I have never successfully completed a captcha served up by cloudflare (and thus Google) on Tor. They are fiendishly difficult to the point I suspected the mechanism is broken.

Have you tried recently? They have gotten way better. There was a time not long ago where I would have agreed with you (the CAPTCHAs were literally impossible for a human in most cases).

CloudFlare has toned the CAPTCHAs down a lot recently, they're now presenting image classification tasks (select street signs/bodies of water/storefronts/cactuses/...), and in my experience, my success rate is close to 100% on those.

Admittedly they still make me solve three of them and they get tiring, but at least they're not completely cutting off access to Tor users anymore.


I have to ask! Can somebody tell me:

Am I supposed to click the tiny corners of signs in another square? If it is a sign that is not a "street sign", such as a billboard, do I click it?

::confused::


In my experience, some edges and corners count, and others don't. Just what the cutoff is, I don't know. It does seem clear that signposts don't count.


+1

I just give up immediately when I see a Cloudflare captcha page on Tor.


If you enable Javascript, you will get an "easier" Captcha. The non JS one is nearly impossible.


It's also worth noting that they did all this whilst getting some serious stick from some of the core Tor devs: https://trac.torproject.org/projects/tor/ticket/18361


I love Tor, and I like CF as a service though I'm not convinced they're a net positive for society. But wow, the non-CF guy (cypherpunks) on that ticket was really being a dick. Gotta hand it to the CF people that they kept engaging and trying to figure out some solution.

Though it seems that the idea of allowing GETs when a site isn't under load attack is probably the right solution?


All anonymous comments show up as "cypherpunks".


I'm glad to see CloudFlare addressing Tor users and issues with CAPTCHA, as I've been victim of this myself multiple times in recent years. In particular is the issue that CloudFlare assumes javascript-enabled browsers, a condition which may well not be met. I recorded an exchange with CloudFlare support some time back in which the CloudFlare rep was apparently unaware how or why this might occur:

https://plus.google.com/104092656004159577193/posts/H2sakaRx...

I'm also aware of some tools/approaches which address the question of fair anonymity -- ensuring well-behaved clients while retaining anonymous status for the client. Best I'm aware these are very experimental. I've also forwarded the information to TK Hyponnen of F Secure, who may have some impression of the approaches.

FAUST: https://gnunet.org/node/1704 (Efficient, TTP-Free Abuse Prevention by Anonymous Whitelisting | GNUnet)

Fair Anonymity: http://arxiv.org/pdf/1412.4707v1.pdf

Assessing these is beyond my skills, but the references may be useful to CloudFare (or others).


I previously didn't have an opinion on cloudflare until recently using the net from a network that was black listed - web surfing was reduced to constantly filling captachas. Bottom line for me is no single organization should have so much power and I have stopped using cdn's and encourage everyone else to do the same.


I'm one of Cloudflare's customers who would blacklist all Tor traffic if I could. I genuinely don't understand why so many people obviously use Tor for all their browsing, and not just for sites where remaining anonymous is desirable. Why not simply switch to a normal browser for normal sites?

Some background - we run several SaaS services for schools, which are politically and socially non-sensitive. The only realistic reasons anyone would want to connect anonymously would be nefarious. Allowing Tor traffic is like a bank having a special ATM round the back with no security cameras - you're giving a free pass to attackers to try anything they want with impunity.

I'm having a hard time seeing what the compensating advantage is. How does not accepting Tor traffic to our "normal" sites lessen the anonymity of Tor traffic to sites where it is important?


The canonical answer is https://www.torproject.org/about/torusers.html.en

Really, no one has any business looking at anything I do on the Internet. I don't use Tor for everything but I may use it when I'm connecting via a network that I believe to be hostile (i.e. just about everything outside of my house)


> (i.e. just about everything outside of my house)

Which ISP do you use? I would be surprised if a trustworthy ISP exists.


Internode, in Australia. They're subject to the data retention laws, so I use HTTPS and VPNs as necessary.

When they're completely consumed by their new owners, TPG, then no, there won't be a trustworthy ISP in Australia. I have high hopes for SkyMesh, though.


If you use TOR only for "bad stuff" then it means that if you are on TOR you must be surely doing "bad stuff".

Using TOR for everything helps to build plausible deniability, if you are always on TOR then an external watcher can't determine if you are doing good or bad stuff.

Of course good and bad are relative terms, if your country doesn't have free speech obviously the "bad stuff" is just speaking freely


Then allow the Tor users and watch them carefully. If you think you have a way to definitively tell that some traffic is malicious, why oh why would you interrupt your enemy while they're making a mistake? Do you think they're not going to come back at you from a clearnet IP?

Edit:

> I'm having a hard time seeing what the compensating advantage is. How does not accepting Tor traffic to our "normal" sites lessen the anonymity of Tor traffic to sites where it is important?

The anonymity of Tor depends upon diversity. The more people using Tor for more things, the harder it is to correlate any particular person's traffic.

That being said, it is your website, and if you decide to block Tor you have as much right as your users do to use Tor. But I'd ask you to think about whether you are actually attaining any benefit.


CAPTCHAs are essentially a broken idea: it is easy for an attacker to send the CAPTCHA image to another website and ask users to fulfill the CAPTCHA for a completely unrelated goal. This trick has been used in the past on certain pr0n websites (users are allowed to see a picture only after they complete the CAPTCHA). Also, one could use a mechanical turk service to circumvent CAPTCHAs.


It means the activity done needs to be more profitable than the cost of a captcha, which means some activity will be deterred.


reCAPTCHA does things like track your mouse movement and a bunch of other hocus pocus.


I really think that the importance of all that extra stuff is massively overstated. I do a large amount of browsing while in incognito mode without being signed in to a Google account and I have NEVER been able to pass one of the new reCAPTCHAS with just a click. I have to complete a challenge every time.

Conversely, while signed into my gmail account I get passed through immediately, regardless of whether I click the box or tab into it and hit space.


Yes and it's still ridiculously easy to automate.


To be fair, the "I'm not a robot"-one-click-thing wasn't done to make automation harder or impossible, but rather to make things more convenient for users. It will fall back to a regular visual captcha if you're doing anything suspicious like requesting captchas at the rate necessary to do comment spam or vulnerability scanning efficiently, so that's probably not going to reduce anyone's captcha typer farm bill too much.


> and a bunch of other hocus pocus

Sounds like security by obscurity.

By the way, some people use a track pad (with a stylus), where the mouse can jump discontinuously.


They aren't taking any one thing at face value, but are combining them to get a better picture.

You might use a trackpad, so you'd "fail" that test, but your useragent is normal, you've been seen before with those cookies, and your IP is good so you are fine.

But if your IP is a known "bad actor", your useragent is something never before seen, your mouse movements are abnormal, and your keyboard inputs are instant, well all of that combined means you are getting blocked.


> But if your IP is a known "bad actor", your useragent is something never before seen,

What if I install a new computer on an IP address freshly provided to me by my ISP? Or what if I just open a new incognito window? Will I get blocked?

> your mouse movements are abnormal, and your keyboard inputs are instant

It seems to me these are really easy to fake programmatically.


>What if I install a new computer on an IP address freshly provided to me by my ISP? Or what if I just open a new incognito window? Will I get blocked?

If there are enough "red flags" you'll probably get a captcha, if there are an overwhelming number of "red flags" you might just get blocked.

Again, just opening an incognito window or a new computer/ip isn't going to do it alone.

>It seems to me these are really easy to fake programmatically.

I'm sure they are, but they make the bar for "automated traffic" a little higher, and weed out some of the lower hanging fruit.


I encountered an impossible situation working on a wordpress site the client insisted needed to be fully reachable via Tor. Parts of the page loaded from a cloudflare CDN but the main site didn't. The user was never presented with the CAPTCHA of course, 3/4 of the page was just missing with no explanation. I never did find a way around that.


As mentioned in the article, if you control the cloudflare CDN you can now whitelist all tor access.


I think he is talking about https://cdnjs.com/

I'd really hope that CloudFlare whitelists Tor for CDNJS.


Probably not cdnjs -- there are lots of people who use CloudFlare for an assets domain (since it's free) -- if they are just serving images, it's a problem to display the CAPTCHA. It is probably best practice, if none of those assets are sensitive, to disable as much security as possible on that domain. It might be worth having some packages of defaults for tuning that. (One of the benefits for our enterprise customers is one of our staff works with them to tune settings.)


Something people need to understand: real anonymity is really, really hard. Your COMSEC is a fairly small portion of the attack surface area, and the consequence of this is that staying anonymous is, BY NECESSITY, going to be very inconvenient.

From this perspective, captchas are a very minor concern. I'm as pro-privacy as anyone, but this expectation that anonymous activity is supposed to be easy or convenient will never be satisfied. Thousands of years of lessons from both military and civilian clandestine operations bears out the critical lesson that anonymity is, by default, very very inconvenient. Nothing is going to change that.


Agree, though pointless systems that likely extract the identity of a user, force them to work for free, etc - and fail to counter the risk they supposedly stop is abusive.


Probably not really suitable for Cloudflare scale, but for others who are forced to use CAPTCHA, consider https://hashcash.io/ for easy Proof-of-Work integration.


One should always be doubting when it comes to security features that do not present the benefits in a clear way. The article for example claims that 18% of global email spam comes from harvested emails that is collected using tor, but is those 18% exclusively using tor?

Do uses who publisher their email address on their website that is hosted through cloudfare see a lower number of spam? It should be a fairly easy thing for cloudfare to test, while also testing vulnerability and login attempts. As an aside, it would also be interesting to see if there is a quality vs quantitative differences in the malicious activity (ie, if serious attempts are done through botnets, and script kiddie activity is done through tor).

The last a final test in order to verify a security measure that has such a high cost as this one, is to ask if its has any meaningful impact to the end result. A website with 10000 vulnerability scans per day is not going to be meaningful improved if it was reduced to 5000 per day, even if that is a 50% reduction. If there is a known vulnerability, the site is going to get hijacked either way.


Worth noting that a surprisingly common amount of sites have other sections of their site which are not routed through Cloudflare. Look for instances of DNS records like this:

    admin.example.com
Such a record is usually not routed through Cloudflare because the last thing a webmaster wants is to solve captchas for their own website. They don't however care much for their visitors if they're subjecting (possibly a substantial amount of them) to captcha solving nonsense.

The content in the non-cf sections of a site can still be accessed because the webmaster is lazy and didn't care to check if a visitor can do a DNS DIG on all their DNS records.

Or you can simply use TOR pluggable transports to pretend you're Googlebot, and also hide all your traffic in Google-like traffic.

I would reserve this for rare cases as there are people in censorship prone countries who really need this bandwidth :)


The pluggable transports are for connections from your Tor client into the Tor network, not connections from exit nodes to the rest of the Internet. Cloudflare (or any destination host) would still be able to detect your connection as originating via Tor.


There are innumerable ways infact to spoof the fact you're not using TOR to a website, and you can read up on these in the TOR documentation.

Ideally you're looking to use TOR as the first hop, and then you dial into the wider Internet with a VPN, or as I mentioned: Using various Google services to camouflage traffic instead of a VPN. This is where pluggable transports come in, because Google doesn't like TOR, so you want to choose how you're connecting to Google, and get to traverse the TOR network to find an optimal route.


Pluggable transports are intended to stop your local ISP detecting or blocking your connection to Tor: https://gitweb.torproject.org/torspec.git/tree/pt-spec.txt

While I agree that you can mask your exit from the Tor network via an additional proxy or VPN, that's not the role of pluggable transports. They're only for connecting in to the Tor network, not out from it.


Antecdata: most of the scamming directed towards our startup comes from Tor exit nodes.


I don't use Tor daily (although I run a small exit node), but I do surf via a VPN. cloudflare's captchas drive me to the brink of insanity, I see at least 50 each day...


Cloudflare says:

> With most browsers, we can use the reputation of the browser from other requests it’s made across our network to override the bad reputation of the IP address connecting to our network. For instance, if you visit a coffee shop that is only used by hackers, the IP of the coffee shop's WiFi may have a bad reputation. But, if we've seen your browser behave elsewhere on the Internet acting like a regular web surfer and not a hacker, then we can use your browser’s good reputation to override the bad reputation of the hacker coffee shop's IP.

I occasionally use a VPN, but I've never gotten a CloudFlare captcha. Is it possible that you might be doing something else other than just using a VPN, such as blocking cookies?


Have you considered setting up your own VPN? That would make the ip "clean" and get less captchas, right?


> Have you considered setting up your own VPN? That would make the ip "clean" and get less captchas, right?

And would negate any anonymity offered by using a VPN.


Tor : Anonymity :: VPN : Privacy

VPNs do not provide anonymity.


I think you misunderstood -- I parsed it as "And using a VPN would negate any anonymity offered (by using Tor)".


In addition to CAPTCHAs, why not just have a button that runs some JavaScript that completes a proof-of-work similar to what mining bitcoins does? You could make it only take 5 seconds on a modern laptop CPU, about as long as it'd take to enter the CAPTCHA anyway, but it'd potentially be a very large road block for spammers/DDOSers.

For those on phones, you can still opt for CAPTCHA if you don't want to kill your battery.


Spending 5 CPU seconds to submit one comment might be significantly cheaper for a spammer than paying someone to solve the captcha for them.

Additionally, specifically with Tor users, you can expect a large chunk of the user base to have JavaScript disabled completely. You can do many things with JavaScript that could be used to build a browser fingerprint, so someone who's already using software to browse the web anonymously is very likely to disable that.


But it isn't cheaper than just mining bitcoins with that same CPU load. This wouldn't prevent spamming, it would just make it unprofitable compared to the alternatives (mining bitcoins).

The JavaScript issue can be gotten around with a browser plugin that does it as well, which would be easy to bundle on the existing Tor browser. JavaScript would still be fine for all the VPN users who get stuck with these things, and the regular users who get them occasionally for whatever reason.


You're not going to make any significant profit mining bitcoin on a desktop CPU, or any "normal" CPU for that matter.

If we assume 5 seconds of CPU time per comment, that's ~17k per day or ~500k per month. The first captcha solving service I found sells 100k solved captchas for $139, so that's about $700 for 500k. As a spammer, I could probably post 5 to 10 times more comments for the same amount of money using your system. This is obviously a very rough estimate, but it should get my point across.


I did this with the comments box on my blog. I don't think it's technically effective, because doing proof of work in JavaScript is orders of magnitude slower than doing it natively (especially with GPU acceleration). It works well enough for me, because I'm not a big enough target, but it wouldn't be a major obstacle for someone who really wanted to cause havoc.

It would be interesting to see how fast you could make the JavaScript code. I'm sure my version is just terribly unoptimized. The requirement to support the least common denominator will present a major problem, though.


I would look at asm.js, do some tests with major browsers and beef it up. It would make it painful under user agents that don't JIT (or compile AOT) by leveraging asm.js conventions (many do though).

If the wasm effort works out (it's looking like it will), it would hopefully alleviate the issue you present entirely and make this solution viable in a very sane way.


It's better to not use javascript with TOR, the browser even suggests you to disable it.


I experimented with this [1] using GPU acceleration. Last I was working on it, phones required you to manually enable WebGL in the browser and even then I couldn't get it to work on mobile. I shelved it, though I probably could have gotten it working.

[1] https://s3-us-west-2.amazonaws.com/excredo/hashrate.html


Because that wouldn't stop them. Let's rather say instead everyone gets a fixed delay in seconds. Then the spammers will just wait out that delay and then spam. Even if the delay is on every single page visit, that doesn't harm a botnet, because they can still do delay/machine_count visits per second.


It generally will stop them. The phrase that applies is something like "you don't have to outrun the cheetah, just another gazelle". It's not about eliminating the spammers, but about making it expensive enough - in CPU, memory, bandwidth, time, human intervention, etc. - that it's not worth them attacking you.

These days a spammer could train a ConvNet to pass reCAPTCHA with > 90% accuracy and very little processing overhead if they really wanted to. The only reason it works is because the bar is "high enough" that it's cheaper to spam somewhere else that has a lower bar.


It doesn't stop them, but it raises the costs of it significantly. Instead of hundreds of requests a second, they're down to one request every 5 seconds, and they're having to run the computer at a full CPU load 100% of the time. It would be more profitable for them to just mine bitcoins at this point, meaning they wouldn't waste that CPU load on spam submissions.


You're mixing the two cases. If every page is limited then yes they have to work hard, but still get delay/machines visits per second. But a human will have to wait the full delay every time a page is visited. This is unacceptable for modern browsing.

If the delay is only once per, let's say a domain, then you don't do anything against the spammers, they only have to wait a full delay once.


Maybe Cloudflare could have a browser plugin that preemptively creates "tokens", or essentially just mines Cloudcoins that you then spend to bypass CAPTCHAs. That way you could make it much more expensive than 5 seconds of CPU and there'd be zero delay (or perhaps not even the Cloudflare splash page). The use case for needing to constantly bypass CAPTCHAs is rare enough it seems reasonable to ask those people to use a browser plugin.


Love it! I hope almost all authentication will become automated. My personal client machine is much better at proving who it (I) am that a human.


> We also made a change based on the experience of having to pass CAPTCHAs ourselves that treated all Tor exit IPs as part of a cluster, so if you passed a CAPTCHA for one you wouldn’t have to pass one again if your circuit changed.

I don't get how this is different than a super cookie. Anyway, I think globally that's a well balance reaction to the TOR issue.


> I don't get how this is different than a super cookie.

The supercookie would survive or be detectable across multiple browser sessions (keep in mind that the Tor Browser automatically deletes regular cookies when you quit). The behavior that CloudFlare is describing here works within a single Tor Browser session but not across multiple sessions.

I believe the Tor Browser is willing to send some first-party session data to a site after changing circuits, so that you wouldn't be logged out of an account if you logged in over Tor and kept that session active for long enough that Tor switched over to a different circuit. This behavior is basically what should allow CloudFlare to recognize that a particular Tor user has recently passed a CloudFlare CAPTCHA (on a particular site). However, if the user quits and restarts Tor Browser, CloudFlare will no longer be able to detect that it's the same visitor (if it could, that would be the supercookie case).


The one time I used TOR the captchas were the ones with both words as squiggly letters which wouldn't work even after 20 attempts, I had to give up.

I was pleased when, recently, I found out they switched to the image based one. Sure, sometimes it still refuses to accept that I selected all the street signs but at least I don't have to give up in frustration after 30 consecutive failed attempts


Why do CAPTCHAs even exist for standard websites (as opposed to account creation, etc)?

Wouldn't something like rate limiting or proof of work achieve the same result? If you're simply allowing someone to browse, you don't actually care whether a user is real or not. You care about stopping automated comments/spam.

Is this just another tentacle of the advertising industry?


It's explained in the article.

On the other hand, anonymity is also something that provides value to online attackers. Based on data across the CloudFlare network, 94% of requests that we see across the Tor network are per se malicious. That doesn’t mean they are visiting controversial content, but instead that they are automated requests designed to harm our customers. A large percentage of the comment spam, vulnerability scanning, ad click fraud, content scraping, and login scanning comes via the Tor network. To give you some sense, based on data from Project Honey Pot, 18% of global email spam, or approximately 6.5 trillion unwanted messages per year, begin with an automated bot harvesting email addresses via the Tor network.

Rate limiting would block legitimate users, and pow doesn't impede the malicious uses of tor.


That doesn't answer my question, though. All of those are reasons to suspect Tor exit nodes, but not reasons to place CAPTCHAs on standard article pages on a site.

The only vaguely reasonable one I can see there is 'ad click fraud', and I think that fundamentally restricting the usefulness of a site for advertising purposes is awful.


Are there any public projects that have used deep learning to defeat captchas? If not, I'm sure it's only a matter of time.


It's a cat and mouse game, of course captchas have been defeated in the past, but captchas just keep getting increasingly complex. Currently, for anonymous requests, reCAPTCHA by Google, which is what Cloudflare uses, asks you to "choose bodies of water" or street signs from a series of images, with each click sometimes revealing more options. It's fairly complex so I guess it hasn't been broken yet. It's also a massive pain in the pass for legitimate visitors.


> If we provide a way to treat Tor differently by applying a rule to whitelist the network's IPs we couldn't think of a justifiable reason to not also provide a way to blacklist the network as well.

Yes there is: "we don't provide censorship as a service".


My trouble with reCAPTCHA...

By using reCAPTCHA, mentioned in the article as a preferred solution, visitors from China are routinely blocked, as reCAPTCHA it is now hosted by Google.

The trouble with CAPTCHA hosting: And if you intend to do anything with China based orgs.


This wouldn't apply to Tor users (which this article is about), as reCAPTCHA can't identify users from China while they're using Tor.


Rather, The Great Firewall of China cannot identify reCAPTCHA (Google) when they are are using Tor. I think it blocks Tor specifically though.


Yep, that's it. I think "normal" relays get blocked almost immediately, but bridges with OBFS work to a certain degree.


This is why people should browser a lot more over Tor, such that the relative amount of malicious traffic is reduced and thus operators will not be able to uphold the argument that Tor is often malicious traffic anymore.


Unfortunately, it won't work.

1. It's a charity tax; you have to convince people to incur the cost of Tor (i.e. CAPTCHAs everywhere) for activities that don't require Tor.

2. You can't neutralise a poison by diluting it.

Firstly, from the operators' POV, if there's a widespread agreement that people use Tor even though they don't need to, then they know voluntary users can be pressured not to use Tor through sheer inconvenience. Even if you wanted to boycott a service that blocked Tor, it's notoriously hard to make good on that threat unless you wield a lot of power or annoyed a very large number of people. So the consequences are minor.

Secondly, the percentage of malicious Tor traffic is a red herring. What operators care about is the origins of malicious traffic. If 50% of your attacks come from one particular country (or Tor) and the cost of losing that traffic is less than the cost of that malicious traffic, there is a real incentive to block that traffic. Combined with the first point, the cost of losing voluntary Tor users is insignificant if they can easily choose not to use it.


> 1. It's a charity tax; you have to convince people to incur the cost of Tor (i.e. CAPTCHAs everywhere) for activities that don't require Tor.

Some people will (and do) do it. You're right that you won't convince everyone to run Tor all the time, but you won't need to.

Also, mozilla have been floating ideas such as integrating Tor into firefox for use in a new kind of private browsing mode. This affects things considerably.

> 2. You can't neutralise a poison by diluting it.

Yes, you can. Both in the metaphorical as well as the direct sense. At some point the solution is too dilute for the poison to cause harm.

I use Tor all the time. I know of local web shops that have rejected the idea of blocking Tor because they looked at their logs and saw that they get actual sales through it - from people like me.


> 1. It's a charity tax; you have to convince people to incur the cost of Tor (i.e. CAPTCHAs everywhere) for activities that don't require Tor.

People are willing to invest personal ressources for charitable purposes. Why not here?

People are willing to fight against discrimination. Why not against discrimination of Tor users?

> 2. You can't neutralise a poison by diluting it.

There is also poisonous traffic from non-Tor adresses.

> Combined with the first point, the cost of losing voluntary Tor users is insignificant if they can easily choose not to use it.

People would strictly avoid restaurants that don't serve coloured people. Why don't they avoid services that don't serve Tor users?


How easy is it for you to know they don't serve Tor users unless you are a Tor user? This is like saying "Why don't colored people avoid restaurants that don't serve colored people?"


HN is the only place I bother solving CAPTCHAs. For everything else, Firefox (Tor Browser) has plugins to get a copy from arhive.org, archive.is, or google-cache. So if the page asks me to solve a CAPTCHA, I don't visit them.

This could be a benefit for the website (lower server load) or a harm (fewer people appear to be reading their content, fewer people see their ads). Whatever the case may be, I'm caught in the crossfire between crackers and servers. I don't care about their war at all. As far as I'm concerned, I'm winning.

Cloudfare wanted me to solve a CAPTCHA to read their article. I tried to archive it, but arhive.is already had a copy of it. This happens to me quite often. So, obviously, I'm not the only one who has figured out a way around their war.

> Security, Anonymity, Convenience: Pick Any Two

Nah, I usually have all three.


> I usually have all three

...

> Firefox (Tor Browser) has plugins to get a copy from arhive.org, archive.is, or google-cache. So if the page asks me to solve a CAPTCHA, I don't visit them.

You don't have convenience.


A one-click measure to circumvent a captcha is pretty good convenience-wise imo.


Sure, but google cache/archive pages often lack some images, have broken javascript, etc. Additionally, there's a huge difference between "point and click", and "point, click, click again, look for the plugin, another click, do this for every page".


> "point, click, click again, look for the plugin, another click, do this for every page".

Honestly, I don't know how I can make the process as complicated as you described it, even if I wanted to. In reality, it is no more complicated than right-click, open in new tab.


Smart Tor users have JavaScript turned off anyway. You might be lacking the images, but lately this has been pretty good on archive.org.


Thus cementing the "no convenience" clause. I understand this is an acceptable tradeoff for some people (myself included) but you can't pretend it's convenient.


Again, I disagree. Not every website today requires JavaScript, and in fact most websites that cater to the sort of audience that includes Tor users are even less likely to. I don't think anyone sees Tor as a daily driver for general web browsing. It's not much less convenient for the use-cases it's meant to support.


Give it a try for a week. You might be amazed how fast, calm and content-rich the web can be, if you disable Javascript by default and whitelist when needed.


I'm well aware of what the web is like without JS. I know it's usable. I'm saying it's not convenient.

Whitelisting is not convenient.

If people are pretending it is, they're doing a disservice to the security community. Kinda how like people pretend GPG is usable and convenient, thus holding back progress in the security UX front.


I find SUBSCRIBE TO OUR NEWSLETTER and LIKE US ON SOCIAL MEDIAS popups and ads much more inconvenient.

Sorry for the late reply.


For users who can afford it, routing a vpn through tor solves the captcha issue with cloudflare. Also adds an extra layer of security


Can you elaborate?

I cannot imagine how Cloudflare could distinguish VPN traffic routed through Tor and standard traffic routed through Tor. The only difference is a hop on the front end, no change to what comes out the exit node.


if you setup an access point to route all of your traffic through tor, then connect to your VPN through that access point, your IP is the VPN IP, not the exit node.


What the point of doing that over connecting directly to the VPN? Seems like the benefit granted by Tor (avoiding leaking who connected to the VPN) would be negated by the fact that there are now payment records from you to the VPN.


If you pay with BTC, prepaid gift card (paid for in cash) etc.. then there are no payment records from you to the VPN. The benefit of this is that the VPN provider doesn't know who you are because you are accessing the VPN through tor. Yet the VPN provides you a stable IP that won't be CAPTCHA'd like most normal tor exit nodes are.

Edit: This is good if you are trying to maintain an Internet profile (i.e. Facebook, twitter etc.) that isn't tied to your true identity.


But you are losing out on some anonymity here. The VPN provider may not know who you are but you are consistently making access with the same user ID and from the same IP. Your activity can be correlated to that account and that IP.

If that's not your aim (like you say - being signed in to the same facebook account all day suffers you the same problem) then this isn't an issue.

But this isn't what a lot of tor users want tor for.


Right, but then (1) your VPN will have a browsing profile which aggregates your otherwise ephemeral, anonymized and un-correlated browsing sessions; and (2) it would be easy for adversaries to extract that very helpful profile. If your threat model does not include (2), (1) is still bad!


anyone else concerned that the captcha's offer an ability to de-anonymize someone completely?

for example having a backend algo offer certain captcha's that show up only in certain areas of the world?

I feel like this is entirely possible and is part of the reason I will not complete and captcha's moving forward.


When using Tor, reCAPTCHA cannot tell where you are.


definitely a big problem. and its Google who has all the data.


I played around with Tor a couple of months ago when it hit the headlines again, just to see what it was all about, and the experience was awful. As an exercise, I decided to fire up Tor browser today just to see if it's gotten any better. It's worse.

Here's what I found:

Large Image based CAPTCHA: Tor is slow. I'm on a 75/25Mbps internet connection and it loads images similar to 24k dial-up. The CAPTCHA I was presented with was the highest bandwidth CAPTCHA I've ever seen. I was given a 9 pictures and needed to select "Bodies of Water". Each click yielded a new square. I had to wait for 4 additional images to load before I could click "Validate". This took over a minute. Then, repeat, this time "Store Fronts", which were hard to discern (is it a blurry Apartment Building front or Store Front?). I received a connection error on one site so I had to repeat the process. With Javascript features turned off, it was a little easier, but included the extra step of having to paste a Base64 encoded string into a text box which failed twice. Every site I tried gave me this CloudFlare CAPTCHA page.

One of the sites I pulled up had no images. I set my privacy settings to the least protected, enabling JavaScript and HTML5, assuming this was the problem. Nope. They used images from another site and I had to grab the image URL and paste it into a browser to see what was going on. It was yet another CAPTCHA. A few minutes later, the previous site displayed images properly.

On to Google. Privacy settings are still at the weakest setting. Type "Google" into the search engine and I get a "wavy text in an image" CAPTCHA. This loads quickly and is easy to answer, but just results in another CAPTCHA. I gave up after 10 tries. Bing, Yandex and Yahoo all worked with Yandex only presenting a CAPTCHA once after the third search I did (simple, like Google, but worked).

This is a terrible experience for people seeking to get around oppressive governments. While I applaud CloudFlare for dogfooding their CAPTCHA system, I doubt they did it in a way that truly simulates the experience via an extraordinarily slow internet connection which is what I ended up with when using Tor Browser. I wonder how much slower things would be if my internet connection was 1Mbps or being interfered with by government infrastructure. I understand the trade-off between securing a site from "evil traffic" that is more likely to originate from a Tor exit node, but why must they use such a bandwidth intensive CAPTCHA? A browsing experience that would have taken seconds to complete took me almost 5 minutes (and a lot of frustration) not including actually consuming the content I was looking for. Are the text in image based CAPTCHAs not good enough for this task? Are there other reasons I'm missing?


> hey used images from another site and I had to grab the image URL and paste it into a browser to see what was going on. It was yet another CAPTCHA.

Maybe browsers shouldn't request <img> src with Accept / and cloudflare should use that to detect whether it can actually serve html?


A "whitelist" tor solution that isn't whitelisting tor by default is really lame. Approximately 0% of website operators will think to enable that so tor users are generally still treated like garbage visiting any CF protected website.


Great blog. Some of the best technical writing out there.


"We this is so important"


captcha's are evil.

unfortunately I do not trust tor.[0][1] I'm unsure the internet will ever be anonymouse unless large completely private networks start gaining popularity.

[0]http://fusion.net/story/238742/tor-carnegie-mellon-attack/ (11/2015) [1]http://www.theguardian.com/technology/2014/jul/22/is-tor-tru...


"Trouble with Cloudflare"

Treating TOR traffic the same as non-TOR traffic makes no sense; read the main link for confirmation they do.

Case in point, and for starters, STOP repeatively requiring a user from a session to keep passing "I'm not a robot" tests. Set a global cookie that's valid for the session, across all of Cloudflair's network, and honor it.

If the "I'm not a robot test" doesn't work unless it's repeatively given, then that is the problem, not TOR.

Please address this issue; thanks.


If you set a global cookie, then someone intercepting the public traffic (think NSA, GCHQ,...) can identify what the user is reading. You just killed anonymity.


Global cookie is a session, if the user wipes the cookie, resets the TOR connection, etc. that's their issue. TOR is not designed to hide sessions, nor would setting a global cookie break anonymity unless the user doesn't understand how TOR works. All exit nodes are watched and session device fingerprints are correlated with or without a global cookie.


If nobody set a global cookie, a passive attacker cannot correlate different tabs using different circuits. See https://trac.torproject.org/projects/tor/ticket/3455 and https://www.torproject.org/projects/torbrowser/design/#ident...


If the same global cookie is accessible via two circuits, that a bug in a product that uses TOR, not TOR; I personally go above and beyond simply creating a new circuit, never open two circuits at the same time or boot, limit TOR sessions to single use, and locally compartmentize data per session, etc. TOR is not plug in play, it takes effort and discipline, and will never be a fully automated solution.


Yes, TBB on default settings is vulnerable to associating multiple tabs (if I'm reading the link above right), if an adversary sets a shared cookie. That does not mean it's ok for someone to set a shared cookie.

The possibility of exploitation does not mean exploitation or making exploitation easier is fine.


Point is Cloudflair giving abusive volumes of requests is ironic, they should stop, and a global cookie won't harm anyone that knows how to use TOR and they could even give the option NOT to set the cookie. Not offering a solution because "I'm not a robot" doesn't work (happy to prove this) and users don't userstand how to use TOR is not an excuse for their behavior and exploitation of users.


How are they exploiting users?


Requiring user to do work for free is the very definition. Google and Cloudflair are very aware that there test don't work for stopping bots, but they're very good at extracting free labor.


Cloudflare gets no benefit from the captcha, so if they were useless as you claim, they have no incentive to keep them.


Unless you work at Cloudflare and aware of it's relation to Google, any comments on there relationship is speculation. That data is vital to Google future and it think being valuable to Google beyond any direct benefit is of value; I'm not aware of any company that provides more of this type of data to Google; Google would have to pay 10k+ contractors $30+ an hour to do this if it wasn't being done for free; Google [Google Search Quality Rater] if you're not aware of what I'm talking about.


I work for CloudFlare. We don't get anything from Google for using reCAPTCHA.


Thanks, might be worth updating the blog post to reflect this, what percent of Google's reCAPTCHA data comes from Cloudflair, and why Cloudflair doesn't roll their own to insure data is not being leaked/given to Google.


That wouldn't really help much, because then a bot owner just needs to solve one CAPTCHA, receive the "I'm not a robot" cookie, then hand it over to their bot.


Right, which is the issue, CAPTCHA's don't work.


Disagree. They don't work if you only show one CAPTCHA. Your proposal defeats the entire purpose, because it only takes human interaction once to defeat CAPTCHA over CloudFlair's entire network. CAPTCHA isn't a silver bullet, and I don't think anyone is lauding it as the end-all be-all to stop spammers and malicious activity.


Have you used TOR and experienced what Cloudflair does? The server 3-10 tests per page requested in some instances collecting massive amount to training data; I'd in fact be suprise do if Google doesn't place an incentive on collecting as much data as possible; likely enough to ID a user 99% of the time based on the input provided by the user. As for it not being a silver bullet, it's well known that CAPTCHA don't stop bots, but mine data from people. There's not a single Google "I'm not a robot" that can't easily be circumvented, I just know most TOR users aren't able to and Google & Cloadflair are exploiting this.


Why is everyone obsessed with knowing whether the user is a bot or an actual person? What differences does it make? A bot it's not inherently malicious, there are thousand of legitimate use cases: a bot may be downloading content as part of a script or some application, checking updates or creating a cache. Traffic generated by bots should not be blocked per se. There are certainly lots of malicious bots scanning for vulnerabilities, DDoSing sites and so on but this applies to people as well.

I think human generated traffic may have priority but blocking bots entirely is nonsense: ultimately the user agent is always a "bot" acting on behalf of an actual person: by blocking this traffic you may always break some user workflow.


>There are certainly lots of malicious bots scanning for vulnerabilities, DDoSing sites and so on but this applies to people as well.

Nah, a human isn't going to waste their time refreshing a page manually 50,000 times.


But it surely can attemp cross-site scripting, send phishing and spam messages, broken requests and look for exploits. Also TOR is generally so slow I don't think there is even the possibility of generating enough traffic for a DOS.


It would depend on the website's resources and services. For example, a layer 7 DoS which just queries an expensive endpoint on the website over and over may not need high traffic volume to overload the website's systems.


Which seems to be a pretty easy pattern to detect. (And not something you'd want to do over Tor)


Sounds like something a bot would say.


CAPTCHAs are useful when you want to rate limit something to an extremely low rate, like for example attempting to login with a username and password.


You could slow down the responses or rate limit the requests, with no need to completely block automatic logins.


Tor largely prevents most approaches to this. You'd need some way to provide a ticket to a given client, check for it later, and, preferably, ensure anonymity over time.

I've just posted top-level in this thread listing two projects of which I'm aware that provide this, though as experimental protocols only. I've been mildly agitating for further development of such tools. Looks as if CloudFlare are working in a similar direction, which I see as positive.


If you are a valuable enough target, this is not an option.

IPs are cheap, if you let someone try 20 times in an hour before banning an IP, there are targets that people will cycle through IPs that quickly for.


You would have to limit it per IP address if you do not want to after all clients. So it would not be effective against someone who can use many IP addresses (eg. with Tor)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: