This is exactly why when we wrote the CAPTCHA code that protects the creation of SpiderOak 2gb free backup accounts (well commented GPLv3 code here -- https://spideroak.com/code ) we took the approach of not making them 100% human readable.
Supposedly it greatly increases the cracking difficulty at the relatively small expense that humans may have to click Next a small percentage of the time. If we notice abuse, it will be easy to tweak the code (it's just a python gimp script) to produce a radically different captcha for a few hours work.
Last week, I was going to register a new web service (forgot which one) and after the third captcha I couldn't get right, I just quit and decided to use a competitor instead.
What I'm trying to say is that ideally you shouldn't use a solution that is less user friendly and possibly infuriating for your future customers/users.
Indeed. It's a balance. Three failures is way too hard. Ours is a fairly simple captcha -- 5 letters/numbers, and our logs show that somewhere around 94% of attempts are correct, and we see few abandoned signups during the captcha answering phase. We could probably improve that further, though.
The primary defense against OCR is to make the segmentation attack hard -- pushing the characters together somewhat. With more tweaking we could probably get closer to a sweet spot of just enough overlap. Not even all of the characters would have to overlap to be effective.
But it's a moot point, since anyone who really wants to defeat captchas en masse, can just go mechanical turk, or even better just setup their own 'porn/warez' sites etc to show your captchas and have random internet users solve it for them.
There's no defense against that... Which makes captchas just a big irritating bag of fail.
I think of it as similar to a home security system. Of course there are ways around it. Chances are that the effort involved means that a burglar will go rob a neighbor's house instead though.
Perhaps captchas make more sense in capital intensive industries with clear avenues for abuse. In the case of SpiderOak, we'd prefer to avoid making the free backup accounts an attractive prospect for warez distribution. YMMV.
I don't follow. Even the best spam bots don't solve every CAPTCHA. If it's a miss (either because they got it wrong or because it's actually unsolvable), they'll just try again, no?
I would think actual humans have time that is more valuable than zombified Windows boxes.
Imagine how much power Google could harness if they just put a little JavaScript that did some parallelizable task on google.com. Since it's so many people's default page, at any given time there's probably millions of computers idling with it open.
It really doesn't look to me like megaupload worked too hard on their captcha. In the general case, this is apparently a difficult problem because of the character segmentation step. So, my guess is that if megaupload did something to decrease the distinction between characters (script letters, for example), the feasibility of a javascript implementation would break down.
WOW is this going to piss of a lot of website owners who rely on captchas. If you can do this in Javascript, then you can do it much better in a server-side language and hook it to the front of your spam cannon. Boo to the spam arms race getting another degree more vicious.
Here's an idea I've seen implemented in some captchas:
Instead of generating an image of text that most people have to strain to read, instead, generate a relatively cleaner image but instead of typing that in, have it ask a question to be answered.
I wrote a userscript to auto-fill the text-based CAPTCHAs that Drupal.org was using for a while. That was just some regexp fun. This is a bit more impressive.
Supposedly it greatly increases the cracking difficulty at the relatively small expense that humans may have to click Next a small percentage of the time. If we notice abuse, it will be easy to tweak the code (it's just a python gimp script) to produce a radically different captcha for a few hours work.