"In an attempt to learn where the solvers live, Savage et al. sent out specially fabricated CAPTCHAs with images of words in various languages. [..] But one organization showed exceptional linguistic versatility, even solving challenges in Klingon."
I combed through the paper and found this paragraph for you guys, the organization only answered two Klingon ones correctly:
---
Finally, the results for ImageToText are impressive. Relative to the other services, ImageToText has appre- ciable accuracy across a remarkable range of languages, including languages where none of the other services had few if any correct solutions (Dutch, Korean, Viet- namese, Greek, Arabic) and even two correct solutions of CAPTCHAs in Klingon. Either ImageToText recruits a truly international workforce, or the workers were able to identify the CAPTCHA construction and learn the correct answers. ImageToText is the most expensive service by a wide margin, but clearly has a dynamic and adaptive labor pool.
I came here to post the same. Has someone written an algorithm than can recognize an alphabet and solve any CAPTCHA based on an alphabet it knows (maybe all of Unicode)? Good god....
They were trying to measure the worker's native language by assuming error rates would go down. One provider had low error rates across the board - the easiest explanation for this is a different incentive system that promoted low error rates, likely at the expense of speed. I'd guess this was one of the expensive service providers, and is how they try to differentiate themselves in the market.
Wouldn't they throw aside a Klingon sample rather than spend time trying to decipher it? CAPTCHAs are supposed to expire after a minute or two.
I'm assuming the Klingon was written in the Klingon alphabet, though, which (as I realized after reading Tycho's post) could be wrong. If it was written in the Latin alphabet, then a person could easily have solved it letter-by-letter in a few seconds.
Yeah. I was assuming it was in a latin alphabet but I was wrong. I just read the paper, it was in a klingon alphabet, so essentially arbitrary symbols. So my guess was wrong, it's difficult to see an explanation here that makes any financial sense. Apparently they concluded that they were other researchers? I suppose you could imagine the results in some sort of auction system that ended up paying a lot (more than they were making) for problems no one else was willing to solve. But that doesn't seem too likely.
This last caveat leads to an interesting economic question. As noted above, retail prices for CAPTCHA-
solving vary over a wide range, from about $1 per thousand to $20 per thousand. This price spread, and
the fact that it’s technically feasible to route a CAPTCHA through the system more than once, suggests a
major arbitrage opportunity. We can set up a high-price CAPTCHA service and farm out all the actual work to
low-price competitors. In a free economy—and what economy could be freer of regulation than a criminal
one?—that situation is not supposed to endure.
"This last caveat leads to an interesting economic question. As noted above, retail prices for CAPTCHA-solving vary over a wide range, from about $1 per thousand to $20 per thousand. This price spread, and the fact that it’s technically feasible to route a CAPTCHA through the system more than once, suggests a major arbitrage opportunity. We can set up a high-price CAPTCHA service and farm out all the actual work to low-price competitors. In a free economy—and what economy could be freer of regulation than a criminal one?—that situation is not supposed to endure."
Finack's comment on the original post was worth noting here: Arbitrage has far more to do with information transparency than the free-ness of the economy, as such arbitrage is pretty common in criminal economies where prices aren't public information.
I went to a speech by savage earlier this fall. The point that I took away from it is that what captchas do is filter out the bad guys who haven't figured out a business model.
On the klingon point: They theorized that the particular organization was a bunch of PHDs rather than farmed labor - and that it had learned from previous 'example answers' they had submitted. That particular organization, it was noted, was also an order of magnitude more expensive than the others.
But... why is an organization of PhDs solving CAPTCHAs? Even if they are the most expensive service, it still seems to offer terrible hourly returns. That only raises the mystery.
I don't understand what difference it makes being in Klingon? usually Captchas are just jibberish anyway. Recognizing individual letters remains the challenge.
are you sure? I thought they were simply postulating that native speakers would more accurately 'capture' the characters of a word/phrase if they recognized it from their own language. If you know what a word is supposed to be, then it's easier to read messy writing. I realize I could find out by just reading the PDF paper, but I think it's worth answering here for the benefit of HN readers
Interesting. I just read the PDF to see - they were sending captchas where the goal was to translate a series of written numbers in the native language into roman numerals. So (une)-(deux)-(trois) --> 1 2 3 along with instructions for solving the captcha in the related language.
As best I can tell, the klingon used wasn't even in a latin font. So the correct answers were essentially translating arbitrary symbols into their associated numbers. Certainly throws off what I was assuming from the blog read.
They do note that the service, ImageToText, was the most expensive by far.
This is a rather surprising finding.