So is this the end of Google captchas asking for where the car/sign/whatever is?...

kurtisc · on Jan 23, 2018

For the car captchas, I've found actually clicking all the boxes with part of a car will always be a wrong answer (distinct from when it just makes you answer twice). Instead, you have to click on the squares that you know it thinks are cars.

This creates a twisted Turing test situation where, to prove you are a human, you have to pretend to be a machine's idea of what a human is.

bildung · on Jan 23, 2018

It was the same when this was about words from old books. I always had to fill in letters the average person would have thought it to be, not what it actually was (e.g. the letter "f" for what really was an "s" in gothic type).

Nowadays it's much easier, you can click anything that looks vaguely the same (e.g. boxy things for cars, ads for traffic signs, traffic signs for store fronts etc.). The fact that it's so easy to poison the training set makes me very wary about the autonomous car future...

MisterTea · on Jan 23, 2018

I actually like poisoning them. Not to be malicious but I feel manipulated into training their software for free. "Oh you wanted to sign up for that web forum? Sorry, but you have to do some free work for us first"

And if you think that it's somehow good because it's mutually beneficial to train AI to better the future of humanity, don't. That is what their marketing department wants you to think.

dazmax · on Jan 23, 2018

The value they're providing is to the forum owner, who has reduced spam-handling workload.

So the forum is providing you value, you are providing Google value, and Google is providing the forum value.

kuschku · on Jan 23, 2018

Sure, but if the public has to provide Google with free data, we should make laws requiring Google to open source the entire ReCaptcha training set.

operon · on Jan 23, 2018

kuschku · on Jan 23, 2018

Anything created with free work should be free in return. Anything created by the public should be available for the public.

sangnoir · on Jan 23, 2018

To use an in-thread example - online forums are created with free work. Should all forums be forced to make their archives available for free download as well?

kuschku · on Jan 25, 2018

For free download? No. Should I be able to scrape them? Yes.

stolsvik · on Jan 24, 2018

Trundle · on Jan 23, 2018

It's not free though. You get access to the forum.

If it was free then you wouldn't be doing them!

snupples · on Jan 23, 2018

Or it could be both what their marketing department wants you to think, and also reasonable.

wott · on Jan 23, 2018

I have more problems with bridges than with cars. The damn thing forces you to select 3 bridges, except that there are only 2... So you are forced to select what it thinks is a bridge, and confirm his erroneous bias even more.

Chaebixi · on Jan 23, 2018

Storefronts are difficult too, I don't think I'm ever good enough at those to satisfy it. The most reasonable one seems to be street signs, but I think it fails me for not flagging the unpainted back of one.

zhte415 · on Jan 24, 2018

Sounds like poor question asking. "Click any square with a bridge" would be better wording.

robryan · on Jan 23, 2018

I had to do 20 or more of these to get some post tracking data recently. Found the same thing with most objects where a very small part of an object hadn’t been classified as containing that object.

0xTJ · on Jan 23, 2018

Does it tell you that you're wrong, or simply give you another set. If it is just giving you another set, it may be that it thinks that you can provide more useful data.

csallen · on Jan 23, 2018

It just gives you another set, but it's so frustrating and annoying as a user that I can't believe they're doing this on purpose.

Chaebixi · on Jan 23, 2018

If you fail, it tells you you were wrong (some red text at the bottom) and gives you another challenge. I think it will sometimes just give you another challenge for more data, but it won't have red failure text.

Chaebixi · on Jan 23, 2018

> This creates a twisted Turing test situation where, to prove you are a human, you have to pretend to be a machine's idea of what a human is.

Exactly. I think Recaptcha was better when it was looking for consistency with other human answers. Using "AI" has the same problem you mentioned, plus its more vulnerable because it has the assumption that your "AI" is unapproachably far ahead of competitors.

nerdponx · on Jan 23, 2018

Street signs are a problem as well. Is the post part of the sign or not?

aswanson · on Jan 23, 2018

Twisted Turing test. There's a novel waiting to be written about this.

thom · on Jan 23, 2018

I went through exactly this today. Wasn't sure if it was the computer being dumb, or other people missing corners of signs or bridges etc. Still, takes me 5 goes every time.

pbhjpbhj · on Jan 23, 2018

The problem with the cars for me is their definition -- is that van classed as a car, how about the half-back, what about a 4x4, what about a 4x4 with no back windows, ...

indigochill · on Jan 23, 2018

What about a square that contains just a sliver of the car from the next box over? Does that still count? I hypothesize that my attempt to classify every pixel related to a car as a "car" may contribute to my failing of these tests.

toyg · on Jan 23, 2018

This is what kurtisc is saying. The algorithm is unable to classify those pixels, so you have to guess which bits of the car can be detected by the algorithm and then select only those. So you have machines asking humans to think like a machine in order to prove they are human.

js8 · on Jan 23, 2018

That was my experience with car and sign captchas as well..

But interestingly, it also depends on my mood, when I feel lazy, I click fewer boxes.

zer0t3ch · on Jan 23, 2018

Isn't it supposed to learn from you? IE, answering technically correctly but what it thinks is incorrect is slightly annoying for you, but better for the system in the long run. (ie better for everyone)

moreless · on Jan 23, 2018

Everyone? Or Google? I don't feel particularly happy about being used as a lab rat, so no, thank you - I don't care about the quality of Google's AI. If anything, I would purposefully mislead it if I knew how.

amrrs · on Jan 23, 2018

Don't you think that's how a lot of training data can be generated efficient for future ML/AI breakthroughs?

kbart · on Jan 23, 2018

So we will get even more "targeted ads" in our faces? No, thanks. I think ML/AI has a great potential (especially in medicine), but I just don't trust ad companies to use it for any good cause.

JshWright · on Jan 23, 2018

Has Google published this training set somewhere? Until they do, you're absolutely right that this is a great way to build a training set, but I don't see how it's to anyone's benefit but Google's.

dboreham · on Jan 23, 2018

I find the same thing with the road signs : other people are giving (imho) the wrong answer by not including squares that cover a small, but non-zero part of the sign.

jraph · on Jan 23, 2018

Has someone ever tried to submit images from Google captchas to Google Images?

An answer like "This is definitely a sign" from Google Images would be funny.

Jaruzel · on Jan 23, 2018

I just tried it:

http://www.jaruzel.com/files/streetsign.jpg

:)

yorwba · on Jan 23, 2018

Of course if you start googling for street signs a lot, they'll hit you with a captcha for each search, so that you can't ask Google to solve their captchas for you until you solve their captchas.

neogodless · on Jan 23, 2018

Now try it with the Speed Limit sign. Does that count as a "street sign" or only ones with street names?!

nannal · on Jan 23, 2018

Your SSL cert is for the wrong domain.

Jaruzel · on Jan 23, 2018

Your https-everywhere is doing it wrong. I don't have an SSL on my personal site (no requirement to). I do have SSL on other domains that share that IP however.

ddevault · on Jan 23, 2018

The idea that you have no requirement to is wrong - your site may not have sensitive information but without SSL it can be MITM'd and used as a vector for malware, ad injection, etc.

Jaruzel · on Jan 23, 2018

You argument is valid (somewhat), but I don't think that I attract enough traffic on a constant basis, to invest the time and effort to SSL up all my domains.

So far I've had a less than stellar experience with letsencrypt, so I'm not quite ready to go all free-certs just quite yet. It also requires a rebuild of my web-server[1] which I've been putting off for a very long time already.

---

[1] See my other post in this thread.

Lutin · on Jan 23, 2018

Why not just use Cloudflare? They do SSL termination for free, plus you'd get caching to boot.

ddevault · on Jan 23, 2018

So long as your site doens't have SSL you shouldn't be linking people to it. You're under no obligation to drop everything and fix your web server security but you should really stop using it until you do.

Spare_account · on Jan 23, 2018

I'm not expert on this but what I'm seeing is that port 443 is serving up a response when jaruzel.com is requested on that port. While you may not be actively advertising that domain with https URLs, it is valid for clients to request one speculatively.

There isn't a valid cert for that domain and for some reason for server is offering a different one. Presumably you need to unbind 443 from that host header name (this is based on memories of configuring IIS a decade ago).

zer0t3ch · on Jan 23, 2018

"port 443 is serving up a response when jaruzel.com is requested on that port"

The only response is a 404, which is exactly what should be displayed (to the best of my knowledge) for a domain that isn't configured for that IP/port when there are other sites utilizing that IP/port.

Spare_account · on Jan 23, 2018

Oh dear, this hasn't gone very well has it. I'll have another look when I'm home, I thought I was closer to the mark. Thank you for the response.

Jaruzel · on Jan 23, 2018

So...

I have an IP... that IP points to a router, that router port-forwards ports 80 and 443 blindly to a web server, on that web server is a bunch of websites. IIS knows which ones to serve to clients based on a) the host-header, and b) the port.

jaruzel.com:443 is not valid, but because I run an older version of IIS[1], that does not support SNI, the cert is bound to the port, not the host-header. As such any domain name that points to the IP will dump you at that cert if you try to connect on port 443.

Hope this clears up any confusion. :)

---

[1] for um... reasons.

Spare_account · on Jan 23, 2018

Thank you, I appreciate the detailed response :-)

_ofdw · on Jan 23, 2018

Complaining about a non-existent SSL cert and then backseat driving the "fix", using words like "you need to", all based on shady memories of configuring IIS a decade ago?

Really?

Spare_account · on Jan 23, 2018

Just having a chat about it really, sorry it rubbed you up the wrong way.

Jaruzel · on Jan 23, 2018

I'm cool with it. I like these sort of side-bar conversations.

louis-paul · on Jan 23, 2018

Yes: https://www.blackhat.com/docs/asia-16/materials/asia-16-Siva...

schrep · on Jan 23, 2018

This is some of the most advanced work out there - but CV is not “solved” most vision systems only can label about 1k categories of objects. So capatchas can still be easiy constructed that would fool these systems. Part of why it is exciting to get this out there others can help us improve it.

dx034 · on Jan 23, 2018

But how many more classes can all users solve? If you have to identify all oaks from a group of trees I doubt many would solve that correctly.

m_ke · on Jan 23, 2018

Imagenet is the only reason why most models are trained on 1K of categories. There's plenty of models in the wild that handle 10s of thousands of classes.

schrep · on Jan 23, 2018

With what precision and recall?

m_ke · on Jan 23, 2018

Not that much worse than what you see on imagenet. Most large companies have internal datasets with >100 million images.

https://arxiv.org/abs/1610.02357

schrep · on Jan 23, 2018

That’s impressive work. Still don’t think we have reached human level for all the categories of things we see in images. But you are correct that my comment about 1k categories is not true for many production systems.

jph00 · on Jan 23, 2018

Whilst there are plenty of things in CV that computers aren't super-human at yet, object classification (given 100+ examples) is not one of them. In datasets with tens of thousands of categories, humans are much worse than computers - e.g. humans are really not good at knowing the difference between every type of mushroom, algae, and model of airplane.

Further, nearly every time a computer recently has been trained to do some very nuanced classification, such as in radiology, they exceed human expert performance.

(Outside of classification, computers are rapidly making progress - for instance they are getting surprisingly good at predicting the next few frames of a video, which requires a lot of "world knowledge" to do correctly.)

m_ke · on Jan 23, 2018

Definitely not close to having things work for all categories. As you scale up to more categories ambiguity and specificity becomes an issue. Clarifai has a nice demo of their model which has >10K classes, https://clarifai.com/demo , the top predictions are usually correct but not always the most relevant.

I only linked to the xception paper because it mentions JFT. It's not state of the art for large scale recognition.

visarga · on Jan 23, 2018

It's not just a matter of detecting objects and locating them. The deeper computer vision problem is to identify object attributes, relations between objects and actions in video. It's much harder to do that because many relations appear in very diverse situations, with objects of different categories, so it's hard to have 1000's of examples for each class of relation.

For example, humans can identify a monkey riding a Segway on the airport runway, but there probably is no such thing in the training set, even if it is quite large. The neural net might not know if that constitutes a "riding" action because it has never seen such a combination. Maybe the monkey is jumping over the thing and the picture shows it in proximity to it, not riding it - a human would know that a slight gap means there is no riding taking place.

Then, the even harder problem is to predict the consequences of actions on objects and just to physically simulate the scene. Such knowledge is useful in robot action planning. Beyond computer vision, there is also a need to create a "mental simulator" that has theory of mind and can simulate other agents (what humans intend), and we need simulators, both physical and mental to create the next level of AI.

humanbot · on Jan 24, 2018

Interesting. Can you teach me what is the state of the art for large scale recognition? Would like to read more about it. Thank you.

adrianN · on Jan 23, 2018

You don't need human level to make CAPTCHAs useless. If you can break the CAPTCHA 10% of the time that's already enough.

itsa2 · on Jan 23, 2018

In case you didn't realize, the guy you're talking to is the CTO of what some may call a "large company."

schrep · on Jan 23, 2018

Doesn’t mean I’m right :) - see above as he had a point.

mcintyre1994 · on Jan 23, 2018

Assuming from context you're CTO of Facebook, the facebook.com/schrep link in your profile isn't working:

This page isn't available. The link you followed may be broken, or the page may have been removed.

spyder · on Jan 23, 2018

The link only works if you are logged in, otherwise it says the page not found, which is wrong message because it makes you think it doesn't exists even if you login.

mcintyre1994 · on Jan 23, 2018

Worked when I logged in, thanks :)

Kiro · on Jan 23, 2018

It works for me.

tvmalsv · on Jan 23, 2018

And I enjoyed the conversation between two people that obviously know the field pretty well.

m_ke · on Jan 23, 2018

Wow that's a great catch.

kbenson · on Jan 23, 2018

Google doesn't even show a catpcha if they have enough tracking info for toy to verify you're a human, which is pretty simple for them if you don't clear all your cookies for a few days. I'm pretty sure Google thinks if they have to show you a catpcha they've failed, but along with that they don't feel the need to make the catpchas particularly easy if they do have to show it.

culerawo · on Jan 23, 2018

Except when it doesn't. Every so often it suddenly spams one with sheer number of captcha on sites that use the Google's Recaptcha API. Then you have to solve detect the houses, cars, store-fronts, deceiver plates and street signs several times successfully to move on or be prepared to solve several more captches to be allowed to move on. I would wish Google fix Recaptcha. It used to be so good.

mpeg · on Jan 23, 2018

Are you using an adblocker or any other privacy extensions (PrivacyBadger, Disconnect, etc.) ?

If you are, Google will spam you to death with captchas; it kinda makes sense because captchas are getting easier to solve for machines, so apparently the new test of humanity is whether Google can track your activity on other sites.

pbhjpbhj · on Jan 23, 2018

Whenever I get spammed with captchas I assume the site is somehow redirecting them to me from a bot.

beobab · on Jan 23, 2018

I hate it when I fail the Turing test, and can't log in to stuff.

culerawo · on Jan 23, 2018

Obviously I am 99.99% sure I am clicking correctly, yet Recaptcha displays me another and another captcha, until it lets me move on. Is it a system bug or maybe their bot detection got quite unreliable?

lotsofcows · on Jan 23, 2018

Neither. When Google finds someone willing to train its AI for free, it likes to take advantage.

culerawo · on Jan 23, 2018

You are a bit sarcastic. What can a user do? Either he continues solving "train their AI for free" until the let you move on or you leave the site?

megous · on Jan 23, 2018

You can complain to the site operator. Google is not forcing their captcha on websites. If enough people do that, perhaps site operator(s) will take notice. Tell them exactly what you don't like so they don't eventually change to different vendor, but same annoying captcha tech.

beobab · on Jan 23, 2018

I remember one particular set of pictures with both buses and coaches (and some with neither) on it, and it asked me to click on the buses. It said I'd got it wrong, but presented an almost identical set afterwards, and I included the coaches, but that was also wrong.

kbenson · on Jan 23, 2018

I've seen this happen, and I think it was because the IP address was previously used for craping of some sort, or somehow set off some flags at Google. VPN providers can cause this because sometimes their IPs are used for just such things.

jacobush · on Jan 23, 2018

It doesn't seem to matter which images I click, it lets me in after a while anyway.

TheAceOfHearts · on Jan 23, 2018

I must be really unlucky in that I regularly get pretty challenging captchas which require a lot of tries before it lets me through. It's frustrating enough to make me avoid future visits to sites which use their captchas.

stochastic_monk · on Jan 23, 2018

I take it you don't use a VPN. I often have to pass a captcha just to use google search when in an IPv4 address. (Considering that ISPs are now allowed to sell my data, using a VPN seems to be an obvious choice.)

It's annoying to the point that it's pushed me to use DuckDuckGo more often and I tend to avoid platforms that require me to continually take their captchas. I used Discord for a little while, but once it started asking me to verify my humanity again periodically per session, I booked it.

crtasm · on Jan 23, 2018

Have you tried Startpage? Lets you Google search via their proxy servers, no issues with tor/vpn that I recall.

stochastic_monk · on Jan 24, 2018

This might be exactly what I need. Thank you!

kbenson · on Jan 23, 2018

So, you're traffic is coming from some provider that purposefully obfuscates info and your traffic is mixed with a bunch of other people's? I can't imagine why they view you as less likely to be verified as a real person...

stochastic_monk · on Jan 24, 2018

I'm not saying that it's an unreasonable assessment on their part, but it is a large annoyance.

My question is: are they doing this to simply get more training data for image classification, reduce server load by minimizing automated traffic, or to sanitize their queries for human input for NLP models?

kbenson · on Jan 24, 2018

> My question is: are they doing this to simply get more training data for image classification, reduce server load by minimizing automated traffic, or to sanitize their queries for human input for NLP models?

As someone that works in an industry where CAPTCHAs have historically played a large role, and some players flat our use technology to bypass them, and do so using proxy and/or VPN services to get good IP addresses to do so, I imagine those automated systems both corrupt the CAPTCHA system somewhat, since it looks like a large corpus of humans behave in a certain manner and it's not humans at all. It likely also causes those IPs to be considered by the CAPTCHA system as highly suspect whenever encountered.

For your next questions, the industry is event ticket resale, and no, we don't do that (there are aboveboard ways to function in this market that rely less on brute force and more on data mining and analysis for specific targeted investment, and sometimes long after it's been on sale).

vinniejames · on Jan 23, 2018

No, they are like using that data to build self driving car algorithms. I would imagine that's why it's always asking you to detect roadsigns

narrator · on Jan 23, 2018

The future is going to be your corporation/country/blockchain's AI vs your adversary's corporation/country/blockchain's AI with vast numbers of humans in the middle of the whole sh*tstorm just trying to survive and live a tolerable life.

visarga · on Jan 23, 2018

That's good, though, I wouldn't want a future where a single AI has all the power. We need a multitude of AIs to increase equality for humans and diversity for AI.

asfdsfggtfd · on Jan 23, 2018

More likely websites will just block Tor exit nodes + whole swaths of IP space (i.e. all AWS+GCP+Azur+DO etc IPs).

freeflight · on Jan 23, 2018

Yup, this seems to be a pretty big factor. When I use a VPN to surf the web I'm usually stuck in captcha hell for the most simple and mundane things.

dokument · on Jan 23, 2018

Amazon will have a new captcha asking how many boxes of cheeze-itz this person is holding on their way out of a store.

jamesmcintyre · on Jan 23, 2018

I get it. lol