Hacker News new | past | comments | ask | show | jobs | submit login
Javascript that auto-fills captchas (userscripts.org)
82 points by soundsop on Jan 23, 2009 | hide | past | favorite | 30 comments



This is exactly why when we wrote the CAPTCHA code that protects the creation of SpiderOak 2gb free backup accounts (well commented GPLv3 code here -- https://spideroak.com/code ) we took the approach of not making them 100% human readable.

Supposedly it greatly increases the cracking difficulty at the relatively small expense that humans may have to click Next a small percentage of the time. If we notice abuse, it will be easy to tweak the code (it's just a python gimp script) to produce a radically different captcha for a few hours work.


Last week, I was going to register a new web service (forgot which one) and after the third captcha I couldn't get right, I just quit and decided to use a competitor instead.

What I'm trying to say is that ideally you shouldn't use a solution that is less user friendly and possibly infuriating for your future customers/users.


Indeed. It's a balance. Three failures is way too hard. Ours is a fairly simple captcha -- 5 letters/numbers, and our logs show that somewhere around 94% of attempts are correct, and we see few abandoned signups during the captcha answering phase. We could probably improve that further, though.

The primary defense against OCR is to make the segmentation attack hard -- pushing the characters together somewhat. With more tweaking we could probably get closer to a sweet spot of just enough overlap. Not even all of the characters would have to overlap to be effective.


But it's a moot point, since anyone who really wants to defeat captchas en masse, can just go mechanical turk, or even better just setup their own 'porn/warez' sites etc to show your captchas and have random internet users solve it for them.

There's no defense against that... Which makes captchas just a big irritating bag of fail.


Indeed! In my experience, the best captcha is asking for a credit card number.


In the same vein, the best captcha is not letting anyone sign up at all.


I think of it as similar to a home security system. Of course there are ways around it. Chances are that the effort involved means that a burglar will go rob a neighbor's house instead though.

Perhaps captchas make more sense in capital intensive industries with clear avenues for abuse. In the case of SpiderOak, we'd prefer to avoid making the free backup accounts an attractive prospect for warez distribution. YMMV.


Are you sure you're human?


I for one, welcome our CAPTCHA-weilding overlords.


we took the approach of not making them 100% human readable.

Sounds like a great way to make sure neither bots nor humans use your service.


"It ain't so much the things we don't know that get us into trouble. It's the things we know that just ain't so."

Like everything else about building software, you have to test it. Watch the logs and see where people abandon the process. If it's a problem, fix it.


I don't follow. Even the best spam bots don't solve every CAPTCHA. If it's a miss (either because they got it wrong or because it's actually unsolvable), they'll just try again, no?

I would think actual humans have time that is more valuable than zombified Windows boxes.


The OCR thing is definitely interesting. But what I find more compelling is that the author created a simple neural network in javascript. Rock!


There's a static demo at http://herecomethelizards.co.uk/mu_captcha/ (via reddit comment)


Imagine how much power Google could harness if they just put a little JavaScript that did some parallelizable task on google.com. Since it's so many people's default page, at any given time there's probably millions of computers idling with it open.

If I were Google I would be sooo evil.


If they did that, Google wouldn't be Google anymore and probably everyone including me would be so pissed that they would use alternatives.


There are quite a few people running Google Toolbar (or Desktop) as a native executable too.


It really doesn't look to me like megaupload worked too hard on their captcha. In the general case, this is apparently a difficult problem because of the character segmentation step. So, my guess is that if megaupload did something to decrease the distinction between characters (script letters, for example), the feasibility of a javascript implementation would break down.


WOW is this going to piss of a lot of website owners who rely on captchas. If you can do this in Javascript, then you can do it much better in a server-side language and hook it to the front of your spam cannon. Boo to the spam arms race getting another degree more vicious.


Do you really think CAPTCHA-breaking is a new thing? There has been code around to break much more difficult CAPTCHAS for years.


is it...i don think so. Its verd hard to segment the captcha.Could any suggest some ida to split in to individual characeters


Here's an idea I've seen implemented in some captchas:

Instead of generating an image of text that most people have to strain to read, instead, generate a relatively cleaner image but instead of typing that in, have it ask a question to be answered.

"What do you get when you add 2 and 2?"

etc...

Would that fare better?


I've seen those stupid mathematics captcha, there's a thing called eval() in javascript.


What about people that type in "four" or "IV" instead of the expected numeral "4"? The answer is still technically correct but one fails the catchpa.


That is an IQ test. It fails to reliably differentiate between machines and humans, as machines become human.


John Resig did a writeup on how it works (and the author makes an appearance in the comments):

http://ejohn.org/blog/ocr-and-neural-nets-in-javascript/


I wrote a userscript to auto-fill the text-based CAPTCHAs that Drupal.org was using for a while. That was just some regexp fun. This is a bit more impressive.


Wait - you parsed an image using regexes?

Or is it something else?


Sorry, should have been clear. They were text-based of the form "which of these does not belong?"


how to segment the captcha's text




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: