I've found it the case if you do more then one "advanced" searches using these operators, your flagged as "unusual activity" and have to do captcha to prove you not a bot.
I got "hellbanned" the other day -- google kept giving me captchas but always marked my answers incorrect. 20 minutes later, I gave up and drove to the library (!).
It reset itself the next day, but it really opened my eyes to the power they have slowly obtained over my life.
I tried making a throwaway account on this site recently.
It took about 5 minutes to get past the (Google) captcha.
First I clicked that I'm not a robot, but then it gave me the photos of doom. After clicking all the right pictures of a bus, say, then I had to wait while they fade out and fade in new ones rather slowly. There were about 5 rounds of selecting. Then it would pick another noun and ask for that. After a few nouns of a number of rounds each, it cycled back to the first noun. A doubly-nested loop of time-delayed captchas!
Thankfully it wasn't important.
But it makes me appreciate how someone getting locked out of their own Gmail or Google Docs must feel.
Needed to do your job today? Tough luck, better luck tomorrow maybe.
I had this happen the other day too. I spent at least 5 minutes each going through 3 different captchas. The slow fade-in / fade-out thing was new and quite annoying. The images were also quite blurry as well.
I suspect that if you have a machine that is less likely to be fingerprinted uniquely, you have to solve more captcha. In this case, I happened to be using Firefox on Linux with a few privacy related addons running. If they believe you are a bot, because you're not watching their ads, or because you are running a privacy enhanced environment, I suspect their algorithms will slow you down.
It would be interesting to prove this hypothesis. I wonder if you browse with Windows + Chrome + No Ad Blockers, if you get right through the captchas vs. if you use Linux + Firefox + Ad block or maybe even the Tor browser.
My hypothesis : Google always shows the captchas now, even when it knows that you're really not a robot because it knows you need its services and you'll probably solve them for free for up to a minute. Related post : https://news.ycombinator.com/item?id=18704960 (Ask HN: Do you think Google's recaptcha has gone greedy off late?)
I don't think you really got hellbanned, Google captchas have just become extremely buggy in recent months. Whenever I have to solve one it always takes me a few tries, sometimes a dozen or more, and I usually triple check whether I've actually selected all the traffic lights/bicycles/chimneys/whatevers before submitting...
(Alternative hypothesis: If you solve a captcha correctly, they intentionally give you a few more, to get more good data for their AI algos.)
I'm inclined to go with your alternative hypothesis. The quality of the captcha response degrades the more times you solve it consecutively.
I was recently developing a web frontend that had a captcha implementation on the login form. The first couple times I solved the captcha there was no problem. Then once it got to fourth and fifth one it started making me solve a few of captcha pages in a row. Eventually it was forcing me to solve several pages of "Select the picture of the fire hydrant" over and over, often marking it as incorrect when it definitely wasn't, and then restarting back from the beginning so each run-through took multiple minutes forcing you to solve dozens of captcha pages before accepting it as correct.
So, to add to your hypothesis, they're probably doing this intentionally not only to get more data for their AI algos, but also to mitigate people training bots on CAPTCHAs.
EDIT: Also the fact that you're being downvoted is baffling to me but my tinfoil hat persona makes me wonder if it's some orchestrated downvote because you're pointing out a hypothesis that Google is trying to keep hidden.
It's both. The more consecutive CAPTCHAs you solve in a short period, the more likely Google is to assume you're not a human (or not up to any good), but you definitely can get google browsing identity (however they tag you as a combination of IP, browser fingerprint, etc) to the point that they won't every consider enough responses a good indicator. I've done it many times, and yes, it usually is fixed within a few hours or the next day.
I agree that a heuristic (machine learning or otherwise) designed to prevent spam was almost certainly the culprit. I disagree with the notion that this does not constitute a hellban, but I'm open to considering alternative nomenclature.
The fact stands that google locked me away from my documents and intentionally undermined my ability to understand what had happened or rectify the situation.
It's probably similar to old recaptcha where they give you a selection of known answers and a couple of unknowns which they're uncertain of. Probably known not-correct answers too so that the system works.
If you select the unknowns as well as all of the known answers (and don't select the wrong answers) they can train their AI robots that the extras you selected are also a part of the "has traffic lights" dataset
Sometimes I use one of these operators by mistake and get instantaneously hellbanned from using google. The first time the captchas kept reappearing a whole day. I read somewhere that the ban is resetted the next day, but the next day I was still getting the captchas and couldn't use google for anything.
I deleted all cookies and suddenly I wasn't banned anymore. So this looks to me like a captcha bug, or a chrome bug, or both. Probably both.
I still browse at the library myself occasionally, but ... ddg? Bing? Startpage? Searx?
There are many other search engines as well as other ways to access the google index.
Also, a couple of years ago someone posted a search engine he wrote, which was able to index the internet for ~$200/month hosting costs. Sure, it doesn’t do partial or full javascript eval like google, so it doesn’t index the deeper web. But “laser air breakdown” is likely mostly in plain HTML docs.
Yep Recaptcha bans work like that, they don't actually tell you they are never going to let you proceed, they just let waste your time thinking that they will, while profiting off you.
Its a perfect example of passive aggressive dysfunctionality when it comes to big tech.
Google's recaptcha became cancer and I really wish companies would stop using it (I think that they are trying to block everyone who still uses a PC). The other day I wasn't even able to create a new reddit account because of it.
I've encountered the same issue.
On top of that, you're no longer allowed to search for specific strings with the '"' operator. It just ignores it. What a world we live in.
I can dictate an email to my phone or search videos based on the semantic interpretation of their automated audio transcriptions (yes I've noticed Youtube will sometimes do this) but I can't search for some unpopular technical term because I'm obviously a bot if I do this. Thanks for correcting me Google, of course I meant to search about "laser hair removal", not "laser air breakdown".
Yeah, this is some cloudflare-level "security" abused to the point where we enter this strange paradoxical state. Advanced features exist, but are not allowed to be used enough by anyone to actually learn them and use often.
Honestly I bet it's a learning algorithm that they used to identify bots, rather than something a human decided.
Spam detection algorithms usually have a training set. One of the features could have been "uses_feature_x", which, according to the training set, is known to have a high probability of being a spam bot (because humans rarely use those features)
This is why we need more variety in search engines. Google does not want you doing "advanced" and "unusual" things with their service anymore, because they're scared that SEOs will use this to manipulate their search outcomes. But of course every SEO targets Google anyway, so the "plain vanilla" results are just as terrible!
The main reason they block this is because it's used by malware as in infection vector - makes it easy to find vulnerable hosts running outdated webapps for example.
Part of me thinks it's not their responsibility to do this. It's pretty paternalistic behavior. It seems typical of Google these days, though. I don't understand why they seem to be actively opposed to power users, all across their products.
I don't understand why they seem to be actively opposed to power users, all across their products.
Power users are the ones who know how things work; they are difficult to manipulate and mislead, will use their knowledge to consume content in the way they want (blocking ads, stripping DRM, rooting devices, etc.), and are in general not docile and "obedient".
In summary: power users know how to control their destiny, and this is something companies like Google are opposed to, because those companies want to be the ones in control.
I feel for some companies this has backfired in the long run. Take for example Blizzard, a video game company. They used to have a perfect reputation but over time it has tarnished due to catering to the masses.
World of Warcraft became more and more accessible over time, and eventually reached a point where it was so accessible that no one wanted to play it since it wasn't fun and there was nothing to achieve.
Diablo 3 was meant to appeal to the mass market but it was really obvious to everyone that the game was horrible since it wasn't actually meant for anyone in particular. They literally had design choices that were obvious to teenagers that it was a profit-making machine.
I haven't really been a video gamer in years but AFAIK they don't have the same reputation and are just an ordinary company now due to changing their model from "amazing product" to "mass market product".
No, the main reason I know about is that advanced operators are more resource intensive than regular queries. Not blocking these would affect search for everybody else.
That, of course, also tends to take care of spammers, malware as you described and just plain buggy bots. Those can be tackled with additional measures, but first and foremost Google cares about search latency. (And quality: higher latency reduces quality, too.)
These days I have to wait longer simply because of Google's "auto-search" functionality and their heavy use of JS. It's like it's trying to guess what I want to see before I'm even done inputing a search query-- which I'm sure is very convenient for novices, not so much if you're a power user and looking for something specific. Other search engines (Bing, DDG) don't have this problem, of course.
I think Google have just stopped even trying to maintain their "flagship" Search service - that the only way I could explain how their search quality and website responsiveness could decline to the level of the old Yahoo.com and Altavista. And don't get me started about the sorry state of their Google Groups archive...
You're right, in theory, but that's not how things work in reality at scale. Remember that every search query propagates through at least a thousand machines.
The fact that they're restricting this feature across the board suggests that they're more scared about SEO efforts. Malware-related searches would be easier to spot as an anomaly and filter out (and the article states that they do this in some cases).
They don't restrict it across the board, I use many of these operators all the time without issue, the only time I run into difficulties is doing very specific "Google dork"-like queries repetitively.
That's exactly the SEO use case though. I mean, it's not exactly a secret that grey/black-hat SEOs want to do this, the problem is that the limits Google places on it inconvenience every power user too!
I must be missing something here - How would this impact SEO? Blackhat SEO is primarily about generating false backlinks to your site, often through blog and comment spam. Search queries to Google will have little if any impact, especially using fancier operators.
Currently running into this problem. I've gotten really frustrated trying to read the news lately, where washingtonpost especially will say that I've used up all my free articles for the month, and can't read anymore.
Okay, that's fine, but then I'd rather not see them in my google results. If wapo wants to put their content behind a paywall, cool, but google should be an index of things I can read. So I've been experimenting with various search flags to exclude them. Started getting hit with captchas just now.
If you're on a Macintosh or iDevice, Apple News lets you block specific sources.
I use this to lock out sources that are behind a paywall, or that only feed Apple News one paragraph and require you to click/scroll through to their site for the full story. Often nothing is lost because those sites are only barely functional on mobile anyway.
Well, kinda. But it's been a few years since it really did. It defaults to "show then what we guess they want rather than what they literally asked for".
That's because we (the people who instinctly know which keyword to use for a search -- or at least we think we do) are the super minority.
Most people are absolutely terrible at keywords based searching. Which is not very surprising since most people are terrible at spelling out what they search to begin with, and they're also terrible at turning a sentence into its important keywords, so doing those two things at once brings all the bad.
To this day, even with current Google helping them out a ton more, I'm still amazed at most people complete inability to type out what they search for in any meaningful way.
Pay attention to how people ask their question when looking for something and you quickly realize they don't ask their question in a logical way, or using the words you would expect. Pay attention to how many time when someone ask you something you feel like saying "What are you actually trying to find/do/achieve ? What is actually your issue, instead of the half-way mess you just said that made no sense ?". And people are also really bad at realizing that and taking a step back, and tech user are not exempt; something like half of stack overflow question are broken that way (totally made up stat).
Then add a layer of "this is a computer I need to computerize my query !" and you enter the land of weird.
PS: with that said, allowing a full verbatim mode AND an option to keep it activated at all time for my account would be totally awesome
> allowing a full verbatim mode AND an option to keep it activated at all time for my account would be totally awesome
You can mostly solve this by creating a keyword search that launches a verbatim search. Depending on your browser, the syntax probably looks something like this (with the essential part being the 'tbs=li:1'):
Then add a single letter keyword (I use 'v') so that you can search type 'v verbatim' to do a verbatim Google search for 'verbatim'. I suppose you could even make this the default search, and use something else if you ever want 'normal' mode.
Of course, this still leaves the problem that verbatim on Google doesn't always mean 'verbatim' anymore, but that part will have to wait until Google changes their ways (unless you put quotes around every word).
You could create keyword that points to a site that takes a regular query and redirects you to a Google search with every provided word quoted. Though someone would have to create that site.
This is the same reason clicking once on the address bar in Chrom(ium) selects the entire text, and two clicks narrows it down to a single location - which seems the reverse of what usually happens. The design justification for this on Monorail was something like "most people use the address bar to perform new searches", instead of editing the URL they were already at, so it was better to save them that one click.
You give "most people" (not yourself, of course...) both too much and too little credit. Pure keyword search was predictable: you typed in the words you wanted to appear on the page, the computer did a bit of stemming, and you got back results that all contained all of those words. It was simple and predictable, so pretty much anyone could figure out how to control it. "Power users" could learn some special syntax to do more.
Current attempts to try to make everything a conversational AI are pure failure. Whoever is in charge of Siri/Alexa/Googlette/Whatever thinks people are saying "hey X, do Y" as if talking to a human. In reality, those people are trying to figure out the magical sequence of words and intonations that will make the thing do what they want, frustratingly and poorly reverse-engineering some inhuman and constantly-changing robot.
They are stupidly obsessed with "everything is a search" philosophy and end up irritating users and on occasion disgracing themselves - for example today if you click through their own link for "S&P 500 Index" on Google Finance, they translate that to a search for the term "INDEXSP: .INX" and fail with "Your search - INDEXSP: .INX - did not match any finance results." Works for Dow Jones and other indexes.
Other examples that bother me: randomly, I don't see my flights in the Google Assistant on my Pixel. I then have to type "My flights" in the search box to see them. Same for weather, hotel bookings, etc.
Some are out of date, I think. I used the "define:" one so often that it became muscle memory for me. I was saddened when it stopped working a couple months ago. It'll still sometimes give you the word definition card, but I think only when just typing the word would give you the same card. I switched to typing "$word dictionary" instead, as I found it to trigger the definition card in almost every case where "define:$word" no longer would.
Of course, this relies on what data Google has on hand. Many of the examples provided in the article work because Google has so much data about "Apple" searches. The more niche the search operator, the more likely no results will surface. That being said, I've always found the "site:", "inurl:", and "+" or "-" operators to be incredibly useful for research reasons.
Define is broken in other fun ways. Try "define:inception". It won't tell you anything about the word, but everything about the movie unless you put it in quotes. But if you use something that's not, coincidentally, a movie title it works just fine.
But do these operators actually perform deterministically? Does the page actually contains a match for my terms? In my experience, this has become less and less true.
The first obvious mistake in the list is that google doesn't default to AND anymore since ages. A list of two terms will be some random combination of one of them, maybe both of them, and occasionally none (and no, this doesn't happen due to fetch/indexing lag or stemming or autocorrect).
To get true inclusive searches you need to quote the terms, individually. People keep pointing at verbatim search, but verbatim performs phrase search, which is not what you want in most of the cases.
DDG suffers from the same. I curse them both. I've used some js to quote individual terms before performing the search to get back useful searches for technical terms.
But really, the number of times I now get pages which do not contain the exact terms I'm looking for is subjectively increasing.
if you do this too much , Google will make you solve a verification image
also google purposefully returns error codes when searching certain numerical string ranges, probably to prevent people from searching for carelessly indexed credit card and social security databases
Google likely filters out (obvious) credit cards and social security numbers. My theory is that they may suspect another company hired a third-world country click farm to go through and grab large amounts of specific data (or obtaining via other methods, like a rogue extension being used on many "known good" users' G accounts) and are trying to dissuade scraping google for that information. They would much rather you go through the hassle of scraping websites yourself.
I once read that 85% of Google searches are repeat searches, queries that Google has "seen" before. If this is false, please ignore the rest of this comment.
If 85% of searches are repeated queries, does this mean Google will treat a query differently if it is a repeated one?
If a query looks like a repeat query is it funnelled into a retrieving a set of predetermined results?
(To be clear, I am referring to queries that do not match exactly.)
No doubt this would be much faster than a "dynamic" search where any similarities to prior queries are ignored.
To keep things "fast" it might be necessary to treat the 85% differently from the 15%.
It might be beneficial for the search engine provider to encourage users to use repeat searches rather create than new ones.
As we know, Google is not transparent regarding the series of steps they use in providing search results from their web cache thus these questions are likely to remain unanswered.
>Search for X and Y. This will return only results related to both X and Y. Note: It doesn’t really make much difference for regular searches, as Google defaults to “AND” anyway. But it’s very useful when paired with other operators.
The author must be using a different google than me. It's been many, many years since google functioned as an AND search. It very frequently decides to drop one or more words from my search if there is a low number of results, it's extremely annoying. Once you could force the old behaviour with +<term>, and then later with quoting ("term"). Both of those now also tend to drop search terms for me.
If anyone has an actually reliable way to get a real AND search out of google, I'm all ears.
I frequently use the dash inside a word so that for example a search for sub-optimal returns results including "sub-optimal", "suboptimal" and "sub optimal".
We just had the challenge to search for a raw materials supplier from within the european union. any normal google query was completely useless, so I hacked this EU & EWR Google Search
I tried !ddg and !duck but it does not quite work.
This Google search engine feels a bit strange and lacking. I hope they will catch up in this competitive market some day. I want to like it because it has a few nice features and I've tried it a few times but I always return to my good old regular search engine.
Less stupid A/B tests (sometimes I'd get idiotic results until I report any error. It'll then fox itself magically as soon as my account is removed from the test ;-)
Less annoying "we know better than you what you wanted to search for.
I was joking, and I actually don't use bangs. I could have made the joke and not actually use Duck Duck Go.
Though I use it so... not giving any data to Google is the killer feature for me. I like the idea that the results only depend on the keywords and not me. Instant results are quite nice too.
I'd like not to indirectly depend on Microsoft though.
Edit: and I'm not bound to Duck Duck Go. I'd be glad to use something else too.
I am adding this list to my personal "How to be a software engineer" syllabus, for when friends ask me how to get started. Search is a skill at the base of the pyramid. I don't care how much expertise one has, they need to know how to find information, fast.
I feel nervous about making myself dependent on them because it's more likely than not that Google would pull the rug from under us once we get comfortable with using them.
What give you this confidence? I discovered the list when I was trying to figure out why "link:" was no longer working, after existing for more than a decade. In another comment here, TeMPOraL says that "define:" (which is on the list) doesn't seem to be supported any more. If it's not making money for Google, and it's not necessary for their internal activities, I wouldn't want to depend on any of these to work in the future.
The + operator was removed despite being there for a long time.
Also there's a whole bunch of wrong info in the article. The most blatant one is that they claim google by default performs an AND search, which even a cursory use of google will demonstrate is not true.
By abused you probably meant abused without payment to Google, or abused by people external to Google. People who pay or are Google can continue their abuse.
I thought I could set them up myself on DDG, but I see now that the keywords are the same for everyone. I guess this means I can't use super short keywords for the sites I frequent most. That's a shame. For example, in Firefox I just prefix my search query with "s" to search stackage.org.
Also, even if they have thousands, I don't think it would be too hard to find a site they don't support. Who knows how long they'd take to add the keyword on request, if they decide to add it.
Pros and cons. No solution is perfect. I guess I'd find DDG's feature more useful if I relied more on my phone for such searches or used random computers I can't setup for myself.
Thanks for the links. I did not know about "inurl" and "intitle" these will be handy helping me re-find results which is something I find myself doing a lot more than I would expect.
Always have proxies when doing a long list of those or you will get banned for 24 to 32 hours or even worse the Captcha's from hell who never seems to work.
Unfortunately, public proxies are more than likely going to be banned already. If you have a dynamic IP, that tends to work better (although that makes me wonder if it's possible to get a whole subnet banned, enough to piss off a lot of others on the same ISP, and thus put more pressure on Google to not do it or at least give the practice more public exposure. "Man gets city banned from Google by using Google search to do what it was designed for" would be a funny headline...)
My search terms often get turned into something else for no reason. If I search for any less common word, there's a good chance the results would not contain that word by default. Maybe I'm crazy, but is Google really improving over time, or actually getting worse?
Google has increasingly moved away from strictly honoring advanced search terms. Excluding terms (-) or using the exact phrase search no longer returns the desired result. Frustrating.
> AND: Search for X and Y. This will return only results related to both X and Y. Note: It doesn’t really make much difference for regular searches, as Google defaults to “AND” anyway. But it’s very useful when paired with other operators.
If space equates to AND, then how is it useful? There's really no technical reason to use AND, right? I mean, even if you nest:
good1 OR (good2 AND -bad)
I imagine it should be equal to:
good1 OR (good2 -bad)
> define:entrepreneur
The : is not needed; one can use a space. Also, I imagine this isn't really an operator, since it doesn't make sense to mix it with any other. Same with "weather:".
It defaults to some weird combination like "OR but show ANDs first." Very often when I search for one term it will show X results. When I add a second term it shows Y results, where Y>>X. That's logically impossible with AND.
A search with good1 OR (good2 AND -bad) says 549,000 results; good1 OR (good2 -bad) says 2,000,000, and the first page of results looks entirely different.