IMHO Google has been in decline since ~2010 or so, but it's only recently that the dive in quality became very noticeable. Try searching for anything even mildly technical outside of software, and you are presented with pages upon pages of completely irrelevant SEO spam. Automotive repair manuals (for very old cars) are one example; it used to be that you could easily find a link to a PDF, and the results were otherwise mostly relevant; but now you get only sites claiming to sell it to you, and more SEO spam.
Two more examples are error messages and IC part markings --- searches are flooded with results that do not even contain all the words in the query. I didn't put those words there for no reason, ignoring them is absolutely unacceptable. This becomes ridiculous when you search for error numbers, where a search containing the exact number and the word "error" gets flooded with plenty of useless results about other errors.
I'm just glad I'm not alone, sometimes I second guess if I am the problem.
Currently just to get basically workable results I'm finding myself putting "every" "keyword" "in" "quotes" (to make sure they're actually in the result pages at all), the site: modifier to restrict it to real sites, and negative modifiers (-"keyword") to remove some SEO results.
Google used to be "magic" in that it knew what you were thinking, and gave you what you wanted instead of what you asked for. These days it is just page after page of auto-generated results, pages that don't contain anything relevant to your query, or just low quality results.
I'm not going to pretend I'm an expert in Google's search, and I'm sure they're meeting some metric or another, but from my perspective things have gone seriously downhill to the point where I am looking elsewhere.
It used to be THE technical search engine de jure. Now it feels like a search engine you have to hack to get it to work well. Not a good place to be.
PS - I have read, in HN comments (so pure rumor) from self-proclaimed ex-Googlers, that Google's internal culture punishes people for improvements/maintenance to existing products, and that promotions come from developing new products/features. If even semi-accurate might go a long way to explaining why Google Search feels neglected aside from changes which seem to exist to improve their button line/promote sister products.
The problem isn't just SEO, it's that Google itself aggressively rewrites queries to produce more results (which I suspect they want to do to show more "relevant" ads).
On the most extreme end of this, I've seen four-word queries produce results, in which three of the words were stricken out. More often, it's just one word, but it's usually exactly the one that makes the difference between a very specific query, and a very generic one.
Worse yet is that they try to do synonym substitution, but their algorithm has a ridiculously low bar for that. Like, you might be searching for "FreeBSD", and it will substitute that for "Linux", or even "Ubuntu". Or search for a specific firearm model, and it finds "gun".
Quoting keywords suppresses all of that, but synonyms are actually useful - if it did them accurately...
I left Google in 2010, so it's just a wild guess, but I suspect a big part of the issue is learn-to-rank is probably being trained on everyone's searches. I think it would probably do much better if they used the presence or absence of search operators as a simple heuristic to separate power user searches from common searches, and trained a separate ranking model on power user searches.
Maybe they're already doing this, but it sure acts like learn-to-rank is always ranking pages as if the query were very sloppy.
It's been a long time, and I certainly never read the code, but I vaguely remember a Google colleague mentioning something (before learn-to-rank) about a back-end query optimizer branch that would intentionally disable much of the query munging if there were any search operators in the query. There was some mention about using cookies / user account information to do the same if the same browser/user had used any search operators in the past N days, but I'm not sure if that was implemented or just being floated as a useful optimization.
I think it has to do with shifting expectations. All of us who use the Web seriously, and have been for years, want a full text search engine. The average user wants what Ask Jeeves promised to be: something that takes vaguely question-shaped queries and spits back a fast answer. Or a glorified URL bar to outsource memory and effort.
No, you don't want a full text search engine. If you think you do, you don't remember the pre-Google world. It was impossible to use the older search engines to find a reasonable explanation of a common topic, because to Alta Vista and other search engines of that era, every page that contained a given term was considered equal to every other page, and it would give you all of them in a random order. You could add lots of AND and OR to try to exclude what you didn't want, and this might cut you down to 40 or 50 pages to go through to maybe find what you want.
But when Google first came out, it was a shock. You could just search for something like "Linux", and the most authoritative sites all showed up on the first page.
and this might cut you down to 40 or 50 pages to go through to maybe find what you want.
At least those search engines gave you that many results to go through... now Google gives you less than that, full of spam (despite the index probably containing far more), and you'll be in CAPTCHA hellban if you try harder to get to the rest.
Yeah, I'm sort of surprised that there isn't a semi-popular "web grep" tool for people who would rather use regex, some understandable ranking algorithm with knobs to tweak, etc.
Of course, you'd have to read a manual to use it and it would have a ton of spam, but some people just want lower-level control - they still sell stick-shift cars.
Not just that, but the sheer scale of such an index. The size of the web now just makes anything small next to impossible without a lot of funding. And none of the existing search engines will probably allow you programmatic/data access to their index without a metric ton of cash
How is it that spiders/bots are able to "index" copyrighted content? Is it just one of those things where the ends justify the means or a holdover/tradition or some such?
It’s some combination of fair use and raw data not being copyrightable. My understanding is that only the creative expression that’s copyrighted, and not the actual words. So, if you distill out all of the creativity into something that’s purely information about the work, you’re probably fine copyright-wise.
There’s a long tradition of compiling and publishing concordances, which are just indices of every place each word appears in the original text. They’re generally not useful without access to the original, so noboy seems to mind them very much. Google’s index is just a modern form of the same thing.
1) Google CANNOT provide you with technical search, because choice of index/query filters is always limited (ie. Do I prefer exact matches over multiple matches?)
2) Google has shareholder & public responsibility. It means that service is adjusted (and it's 'algorithms') towards biggest type of queries performed.
All of this is a constant battle between precision and recall for given query. Adding to complexity, Google needs to account for
* Extraordinary amount of users using their search
* Extraordinary amount of data on webpages
* Importance of authority
In smaller search engines (ie. shop full-text search) you usually adjust towards one use case. This in itself is already hard.
Google does that for all possible use cases, for all possible queries while still fighting same precision/recall battle.
To be clear. I think google is terrible, but I also think that there is no other option for them at this point.
All of this became clear for me the moment I've got interested in build search and relevance engines.
DuckDuckGo does something fancy with smooshing together Google and Microsoft databases with their own to create a half-decent search engine.
Cliqz has hardly anything indexed at the moment, but it actually gets relevant results from those. (e.g. "zoom privacy" brings up the Zoom privacy policy first, then three news headlines from the last 12 hours, then a news article from yesterday, then an IT@Cornell guide for making Zoom meetings private, then some more news articles, some stuff about HIPPA…) I really like it, even if it isn't great for programming at the moment.
In the early 2000s, it was the one search engine that returned relevant results. All the old ones returned a hodge-podge of key-word matches. Google used the number of citations as a strong signal of relevance (the number of pages that linked to a page meant something).
In the age of GMail beta, Google was my hero and I wanted to work for them. They were doing new things with the browser. First was Google Pack, which included necessary software like Firefox. Then Google introduced Chrome, which was even better.
Both began to descend. I don't know, they became slower and clunkier, as if they were designed by the denizens of any number of companies. Google had about 50,000 people at this point.
Recently, I can't stand even to read Google's documentation, because of its (1) bureaucratic wordiness, and (2) cluttered layout that reminds me of that video about Microsoft redesigning the iPod box. In the early 2000s, Google was a maverick. Nowadays, Google is hard to distinguish from any other corporate giant.
To put things in perspective, Google was founded in 1998, making them 22 years old in September. The "Halloween Documents" were released in August 1998, when Microsoft was 23.
Google is indistinguishable from any other corporate giant because they now are any other corporate giant.
Did anyone seriously think an Ad company would go any other way? It's the scummiest of industries. It was always just a matter of time.
And now we've given (allowed, stood by, whatever) them ALL the data for >50% of our cellular users for the last 10+ years. I'm an Android user too. Oof.
I see the same issue on GitHub too. Recently searched for a Windows API call and neither Google nor DuckDuckGo nor GitHub returned any result.
Cloned a git repo where I thought I remembered seeing the function and 500 MB of downloading later grep confirms that I remembered correctly and that the exact keyword I had been searching for is present multiple times in the source code.
My hunch is that since coders use an ad blocker anyway, it's not financially viable to operate source code search on the public internet.
I had this exact thing happen to me yesterday.
I copied a function header from GitHub and searched in GitHub search for it to find the declaration. No results. Clonsed the repo, and clicked "Go to declaration" in my IDE bum "you have reached your destination".
Wtf?
Yeah, GitHub's search is next to useless to me because I can't trust it. I search for a string, maybe get some results, but I can't be confident that those are all the occurences of the string I searched for.
The SEO toxic sludge has become so pervasive that I can't trust any search these days. Here's an example from just last week.
I was searching for the official AWS security certificates (namely, for ISO27001), which AWS neatly publishes on their site. Even for something that specific, the real AWS certification page was at the bottom of the first Google result page. Everything above it was from various random consulting outfits, all trying to sell their "expertise".
When search terms for a company's official security certificate are poisoned with SEO, we know the well has been thoroughly poisoned.
Thanks for adding an actual example to this thread. I haven't personally noticed search get worse, so I appreciate it.
Anyway, I typed in your query " IS027001", and you're right, the AWS result is at the very bottom of the first page on mobile. But if you search "AWS IS027001" it's at the very top.
But IS027001 is not an AWS-specific thing, and the results above it are about the standard. It would seem equally bad to return AWS's product pages to somebody looking for information about the standard.
There's definitely an argument that there are too many results from random websites satisfying the "general information about this standard" intent and it would be good if google could guess what everyone wanted simultaneously, but the query is pretty ambiguous...
Been desperately trying to get information on some retrocomputing equipment I bought and Google has been next to useless. Except for handful of results (no more than a dozen) majority of results seem to be websites that scrape Ebay listings and rehost that information. Much of the content that supposedly matched, but was behind a paywall and it was difficult to tell if it would be any useful offhand.
To be fair, in my case the lack of information is real (I've gotten only a minuscule amount of info by asking on niche forums sadly), but cutting out the noise early would be helpful.
But I perceive this as Google losing the battle with SEO. On one hand Google writes the algorithms, and on the other SEO tries to exploit these algorithms to rank as high as possible.
I was thinking that a solution would be instead of page rank, have an author rank. This rank keeps track of authors that are experts at certain topics. When you search for certain keywords, the articles of these known experts are ranked higher.
I think it's more than that. If you also consider AMP and various Chrome redesigns, there is this feeling that Google wants to take control of the web. URLs and site's actual locations are being de-emphasized, and searching is becoming more "ask question, get an answer" instead of "enter query, get documents best matching it".
Yeah. AMP is the intentional promotion of content whose creators care a lot about SEO over content which is more reputable, relevant and established and sometimes even with lower page weight. If they're losing a battle with spammers on mobile, it's because they gave them a loaded gun and lined up their search results in its crosshairs.
For me the line was the Panda update, that update just completely destroyed the quality of the search results. They did try to improve the quality since that update (with moderate success) but it never came back to the level they had before it.
The ones that do best, the ones that are the greatest public benefit, seem to be governed by a strong leader who stays at the helm.
As soon as the people at the top change, you get jostling and unclear leadership, and power gets diffused.
You get emergent behavior - lots of internal and external competing for power and interests - and the ability to say "no" or "this sucks" or "we will do this" happens less frequently.
Normally competition would take care of all this, but with big gorillas that dominate a market, it might take a while.
> Normally competition would take care of all this,
Not necessarily. When you find yourself with beancounters and MBAs at the helm, they for sure will optimize the company to outcompete others. Such companies will eventually die from the rot, but not before they drag down the entire industry they're operating in.
Very true. Two years ago I fully switched to qwant and DDG, and never looked back. It's not because these two engines are great, it's because Google's output is as good as random.
Maybe it's just that now there's whole lot more money in the game of tricking Google? In 2010, people too dumb to use a computer were just getting on the Internet, as smartphones only started to become a thing. Now these people are in and there is lot of money to make on them.
SEO spam instead of car repair manuals is a joke. It has a lot bigger and scarier implications. Just try imagining someone like Obama winning elections now... simply not possible... And it's not because people became more dumb, they totally didn't: it's just because now dumb people have a voice online. And spend money online. Internet works for them now, and shows everyone what they want to see...
Overnight Oats as an Example. You just find pages upon pages of blogspam. And almost every page has the same useless information. No normal recipe site is somehow in the top 10 pages.
You may be overlooking where Google found the missing words or concepts in a page that linked to the hit, just not within the hit itself. That can actually be useful sometimes.
I couldn't agree more. I research a lot of history and get junk... so much junk on Google. I also research technical info for our website and get junk. What do you use? What do you recommend?
Yes, for example when I'm looking for a datasheet of an electronic component, I often get SEO spam instead of just redirecting me to the pdf which usually is readily available at the manufacturer's website.
Fully agree with you, Google -as a lot of us expected- has turned into a corporate scourge.
Which makes their mission statement "to organize the world's information and make it universally accessible and useful" come across like a bad joke. They really should update it since they don't even bother to keep up appearances anymore.
Two more examples are error messages and IC part markings --- searches are flooded with results that do not even contain all the words in the query. I didn't put those words there for no reason, ignoring them is absolutely unacceptable. This becomes ridiculous when you search for error numbers, where a search containing the exact number and the word "error" gets flooded with plenty of useless results about other errors.