CAPTCHA Arbitrage

bartman · on Nov 24, 2010

"In an attempt to learn where the solvers live, Savage et al. sent out specially fabricated CAPTCHAs with images of words in various languages. [..] But one organization showed exceptional linguistic versatility, even solving challenges in Klingon."

This is a rather surprising finding.

yoonminn · on Nov 24, 2010

I combed through the paper and found this paragraph for you guys, the organization only answered two Klingon ones correctly:

---

Finally, the results for ImageToText are impressive. Relative to the other services, ImageToText has appre- ciable accuracy across a remarkable range of languages, including languages where none of the other services had few if any correct solutions (Dutch, Korean, Viet- namese, Greek, Arabic) and even two correct solutions of CAPTCHAs in Klingon. Either ImageToText recruits a truly international workforce, or the workers were able to identify the CAPTCHA construction and learn the correct answers. ImageToText is the most expensive service by a wide margin, but clearly has a dynamic and adaptive labor pool.

dkarl · on Nov 24, 2010

I came here to post the same. Has someone written an algorithm than can recognize an alphabet and solve any CAPTCHA based on an alphabet it knows (maybe all of Unicode)? Good god....

trotsky · on Nov 24, 2010

They were trying to measure the worker's native language by assuming error rates would go down. One provider had low error rates across the board - the easiest explanation for this is a different incentive system that promoted low error rates, likely at the expense of speed. I'd guess this was one of the expensive service providers, and is how they try to differentiate themselves in the market.

dkarl · on Nov 24, 2010

Wouldn't they throw aside a Klingon sample rather than spend time trying to decipher it? CAPTCHAs are supposed to expire after a minute or two.

I'm assuming the Klingon was written in the Klingon alphabet, though, which (as I realized after reading Tycho's post) could be wrong. If it was written in the Latin alphabet, then a person could easily have solved it letter-by-letter in a few seconds.

trotsky · on Nov 24, 2010

Yeah. I was assuming it was in a latin alphabet but I was wrong. I just read the paper, it was in a klingon alphabet, so essentially arbitrary symbols. So my guess was wrong, it's difficult to see an explanation here that makes any financial sense. Apparently they concluded that they were other researchers? I suppose you could imagine the results in some sort of auction system that ended up paying a lot (more than they were making) for problems no one else was willing to solve. But that doesn't seem too likely.

Tycho · on Nov 24, 2010

wouldn't they be Klingon phrases written with normal letters? i mean, nobody has a Klingon keyboard, do they?

sp332 · on Nov 24, 2010

There's a pretty common transliteration system for writing Klingon using latin characters. http://en.wikipedia.org/wiki/Klingon_writing_systems

riffer · on Nov 24, 2010

The hard part is to insert more "mistakes" on the CAPTCHAs that are supposed to be tougher.

Xuzz · on Nov 24, 2010

    This last caveat leads to an interesting economic question. As noted above, retail prices for CAPTCHA-
    solving vary over a wide range, from about $1 per thousand to $20 per thousand. This price spread, and 
    the fact that it’s technically feasible to route a CAPTCHA through the system more than once, suggests a
    major arbitrage opportunity. We can set up a high-price CAPTCHA service and farm out all the actual work to
    low-price competitors. In a free economy—and what economy could be freer of regulation than a criminal
    one?—that situation is not supposed to endure.

Well, at least until they add CAPTCHAs.

adamtmca · on Nov 24, 2010

"This last caveat leads to an interesting economic question. As noted above, retail prices for CAPTCHA-solving vary over a wide range, from about $1 per thousand to $20 per thousand. This price spread, and the fact that it’s technically feasible to route a CAPTCHA through the system more than once, suggests a major arbitrage opportunity. We can set up a high-price CAPTCHA service and farm out all the actual work to low-price competitors. In a free economy—and what economy could be freer of regulation than a criminal one?—that situation is not supposed to endure."

Finack's comment on the original post was worth noting here: Arbitrage has far more to do with information transparency than the free-ness of the economy, as such arbitrage is pretty common in criminal economies where prices aren't public information.

willscott · on Nov 24, 2010

I went to a speech by savage earlier this fall. The point that I took away from it is that what captchas do is filter out the bad guys who haven't figured out a business model.

On the klingon point: They theorized that the particular organization was a bunch of PHDs rather than farmed labor - and that it had learned from previous 'example answers' they had submitted. That particular organization, it was noted, was also an order of magnitude more expensive than the others.

gwern · on Nov 24, 2010

But... why is an organization of PhDs solving CAPTCHAs? Even if they are the most expensive service, it still seems to offer terrible hourly returns. That only raises the mystery.

aristus · on Nov 24, 2010

An organization of PhDs writing software that solves captchas.

roel_v · on Nov 24, 2010

"it still seems to offer terrible hourly returns."

Not if they train software to recognize letters. Once they succeed in that, they essentially have a money printing machine for a while.

Tycho · on Nov 24, 2010

I don't understand what difference it makes being in Klingon? usually Captchas are just jibberish anyway. Recognizing individual letters remains the challenge.

rmc · on Nov 24, 2010

They used the Klingon alphabet, which is nothing like the latin alphabet

willscott · on Nov 24, 2010

They were asking for english translations of words in various languages - hoping to determine the demographics of the workers solving the captchas.

Tycho · on Nov 24, 2010

are you sure? I thought they were simply postulating that native speakers would more accurately 'capture' the characters of a word/phrase if they recognized it from their own language. If you know what a word is supposed to be, then it's easier to read messy writing. I realize I could find out by just reading the PDF paper, but I think it's worth answering here for the benefit of HN readers

trotsky · on Nov 24, 2010

Interesting. I just read the PDF to see - they were sending captchas where the goal was to translate a series of written numbers in the native language into roman numerals. So (une)-(deux)-(trois) --> 1 2 3 along with instructions for solving the captcha in the related language.

As best I can tell, the klingon used wasn't even in a latin font. So the correct answers were essentially translating arbitrary symbols into their associated numbers. Certainly throws off what I was assuming from the blog read.

They do note that the service, ImageToText, was the most expensive by far.