Hacker News new | past | comments | ask | show | jobs | submit login

For the car captchas, I've found actually clicking all the boxes with part of a car will always be a wrong answer (distinct from when it just makes you answer twice). Instead, you have to click on the squares that you know it thinks are cars.

This creates a twisted Turing test situation where, to prove you are a human, you have to pretend to be a machine's idea of what a human is.




It was the same when this was about words from old books. I always had to fill in letters the average person would have thought it to be, not what it actually was (e.g. the letter "f" for what really was an "s" in gothic type).

Nowadays it's much easier, you can click anything that looks vaguely the same (e.g. boxy things for cars, ads for traffic signs, traffic signs for store fronts etc.). The fact that it's so easy to poison the training set makes me very wary about the autonomous car future...


I actually like poisoning them. Not to be malicious but I feel manipulated into training their software for free. "Oh you wanted to sign up for that web forum? Sorry, but you have to do some free work for us first"

And if you think that it's somehow good because it's mutually beneficial to train AI to better the future of humanity, don't. That is what their marketing department wants you to think.


The value they're providing is to the forum owner, who has reduced spam-handling workload.

So the forum is providing you value, you are providing Google value, and Google is providing the forum value.


Sure, but if the public has to provide Google with free data, we should make laws requiring Google to open source the entire ReCaptcha training set.


Why?


Anything created with free work should be free in return. Anything created by the public should be available for the public.


To use an in-thread example - online forums are created with free work. Should all forums be forced to make their archives available for free download as well?


For free download? No. Should I be able to scrape them? Yes.


Yes.


It's not free though. You get access to the forum.

If it was free then you wouldn't be doing them!


Or it could be both what their marketing department wants you to think, and also reasonable.


I have more problems with bridges than with cars. The damn thing forces you to select 3 bridges, except that there are only 2... So you are forced to select what it thinks is a bridge, and confirm his erroneous bias even more.


Storefronts are difficult too, I don't think I'm ever good enough at those to satisfy it. The most reasonable one seems to be street signs, but I think it fails me for not flagging the unpainted back of one.


Sounds like poor question asking. "Click any square with a bridge" would be better wording.


I had to do 20 or more of these to get some post tracking data recently. Found the same thing with most objects where a very small part of an object hadn’t been classified as containing that object.


Does it tell you that you're wrong, or simply give you another set. If it is just giving you another set, it may be that it thinks that you can provide more useful data.


It just gives you another set, but it's so frustrating and annoying as a user that I can't believe they're doing this on purpose.


If you fail, it tells you you were wrong (some red text at the bottom) and gives you another challenge. I think it will sometimes just give you another challenge for more data, but it won't have red failure text.


> This creates a twisted Turing test situation where, to prove you are a human, you have to pretend to be a machine's idea of what a human is.

Exactly. I think Recaptcha was better when it was looking for consistency with other human answers. Using "AI" has the same problem you mentioned, plus its more vulnerable because it has the assumption that your "AI" is unapproachably far ahead of competitors.


Street signs are a problem as well. Is the post part of the sign or not?


Twisted Turing test. There's a novel waiting to be written about this.


I went through exactly this today. Wasn't sure if it was the computer being dumb, or other people missing corners of signs or bridges etc. Still, takes me 5 goes every time.


The problem with the cars for me is their definition -- is that van classed as a car, how about the half-back, what about a 4x4, what about a 4x4 with no back windows, ...


What about a square that contains just a sliver of the car from the next box over? Does that still count? I hypothesize that my attempt to classify every pixel related to a car as a "car" may contribute to my failing of these tests.


This is what kurtisc is saying. The algorithm is unable to classify those pixels, so you have to guess which bits of the car can be detected by the algorithm and then select only those. So you have machines asking humans to think like a machine in order to prove they are human.


That was my experience with car and sign captchas as well..

But interestingly, it also depends on my mood, when I feel lazy, I click fewer boxes.


Isn't it supposed to learn from you? IE, answering technically correctly but what it thinks is incorrect is slightly annoying for you, but better for the system in the long run. (ie better for everyone)


Everyone? Or Google? I don't feel particularly happy about being used as a lab rat, so no, thank you - I don't care about the quality of Google's AI. If anything, I would purposefully mislead it if I knew how.


Don't you think that's how a lot of training data can be generated efficient for future ML/AI breakthroughs?


So we will get even more "targeted ads" in our faces? No, thanks. I think ML/AI has a great potential (especially in medicine), but I just don't trust ad companies to use it for any good cause.


Has Google published this training set somewhere? Until they do, you're absolutely right that this is a great way to build a training set, but I don't see how it's to anyone's benefit but Google's.


I find the same thing with the road signs : other people are giving (imho) the wrong answer by not including squares that cover a small, but non-zero part of the sign.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: