Javascript that auto-fills captchas

rarrrrrr · on Jan 23, 2009

This is exactly why when we wrote the CAPTCHA code that protects the creation of SpiderOak 2gb free backup accounts (well commented GPLv3 code here -- https://spideroak.com/code ) we took the approach of not making them 100% human readable.

Supposedly it greatly increases the cracking difficulty at the relatively small expense that humans may have to click Next a small percentage of the time. If we notice abuse, it will be easy to tweak the code (it's just a python gimp script) to produce a radically different captcha for a few hours work.

DavidPP · on Jan 23, 2009

Last week, I was going to register a new web service (forgot which one) and after the third captcha I couldn't get right, I just quit and decided to use a competitor instead.

What I'm trying to say is that ideally you shouldn't use a solution that is less user friendly and possibly infuriating for your future customers/users.

rarrrrrr · on Jan 23, 2009

Indeed. It's a balance. Three failures is way too hard. Ours is a fairly simple captcha -- 5 letters/numbers, and our logs show that somewhere around 94% of attempts are correct, and we see few abandoned signups during the captcha answering phase. We could probably improve that further, though.

The primary defense against OCR is to make the segmentation attack hard -- pushing the characters together somewhat. With more tweaking we could probably get closer to a sweet spot of just enough overlap. Not even all of the characters would have to overlap to be effective.

axod · on Jan 23, 2009

But it's a moot point, since anyone who really wants to defeat captchas en masse, can just go mechanical turk, or even better just setup their own 'porn/warez' sites etc to show your captchas and have random internet users solve it for them.

There's no defense against that... Which makes captchas just a big irritating bag of fail.

irrelative · on Jan 23, 2009

Indeed! In my experience, the best captcha is asking for a credit card number.

cdr · on Jan 24, 2009

In the same vein, the best captcha is not letting anyone sign up at all.

rarrrrrr · on Jan 24, 2009

I think of it as similar to a home security system. Of course there are ways around it. Chances are that the effort involved means that a burglar will go rob a neighbor's house instead though.

Perhaps captchas make more sense in capital intensive industries with clear avenues for abuse. In the case of SpiderOak, we'd prefer to avoid making the free backup accounts an attractive prospect for warez distribution. YMMV.

employfive · on Jan 23, 2009

Are you sure you're human?

jonursenbach · on Jan 23, 2009

I for one, welcome our CAPTCHA-weilding overlords.

zain · on Jan 23, 2009

we took the approach of not making them 100% human readable.

Sounds like a great way to make sure neither bots nor humans use your service.

rarrrrrr · on Jan 23, 2009

"It ain't so much the things we don't know that get us into trouble. It's the things we know that just ain't so."

Like everything else about building software, you have to test it. Watch the logs and see where people abandon the process. If it's a problem, fix it.

eli · on Jan 24, 2009

I don't follow. Even the best spam bots don't solve every CAPTCHA. If it's a miss (either because they got it wrong or because it's actually unsolvable), they'll just try again, no?

I would think actual humans have time that is more valuable than zombified Windows boxes.

geuis · on Jan 23, 2009

The OCR thing is definitely interesting. But what I find more compelling is that the author created a simple neural network in javascript. Rock!

skorgu · on Jan 23, 2009

There's a static demo at http://herecomethelizards.co.uk/mu_captcha/ (via reddit comment)

tlrobinson · on Jan 23, 2009

Imagine how much power Google could harness if they just put a little JavaScript that did some parallelizable task on google.com. Since it's so many people's default page, at any given time there's probably millions of computers idling with it open.

If I were Google I would be sooo evil.

volida · on Jan 24, 2009

If they did that, Google wouldn't be Google anymore and probably everyone including me would be so pissed that they would use alternatives.

eli · on Jan 24, 2009

There are quite a few people running Google Toolbar (or Desktop) as a native executable too.

medearis · on Jan 23, 2009

It really doesn't look to me like megaupload worked too hard on their captcha. In the general case, this is apparently a difficult problem because of the character segmentation step. So, my guess is that if megaupload did something to decrease the distinction between characters (script letters, for example), the feasibility of a javascript implementation would break down.

seldo · on Jan 24, 2009

WOW is this going to piss of a lot of website owners who rely on captchas. If you can do this in Javascript, then you can do it much better in a server-side language and hook it to the front of your spam cannon. Boo to the spam arms race getting another degree more vicious.

jrockway · on Jan 24, 2009

Do you really think CAPTCHA-breaking is a new thing? There has been code around to break much more difficult CAPTCHAS for years.

Prabhu · on May 25, 2009

is it...i don think so. Its verd hard to segment the captcha.Could any suggest some ida to split in to individual characeters

statictype · on Jan 24, 2009

Here's an idea I've seen implemented in some captchas:

Instead of generating an image of text that most people have to strain to read, instead, generate a relatively cleaner image but instead of typing that in, have it ask a question to be answered.

"What do you get when you add 2 and 2?"

etc...

Would that fare better?

est · on Jan 24, 2009

I've seen those stupid mathematics captcha, there's a thing called eval() in javascript.

_tggb · on Jan 24, 2009

What about people that type in "four" or "IV" instead of the expected numeral "4"? The answer is still technically correct but one fails the catchpa.

gravitycop · on Jan 24, 2009

That is an IQ test. It fails to reliably differentiate between machines and humans, as machines become human.

eli · on Jan 24, 2009

John Resig did a writeup on how it works (and the author makes an appearance in the comments):

http://ejohn.org/blog/ocr-and-neural-nets-in-javascript/

eli · on Jan 24, 2009

I wrote a userscript to auto-fill the text-based CAPTCHAs that Drupal.org was using for a while. That was just some regexp fun. This is a bit more impressive.

statictype · on Jan 24, 2009

Wait - you parsed an image using regexes?

Or is it something else?

eli · on Jan 24, 2009

Sorry, should have been clear. They were text-based of the form "which of these does not belong?"

Prabhu · on May 25, 2009

how to segment the captcha's text