Google now defaults to not indexing your content

JohnFen · 2024-07-15T21:02:07 1721077327

This is a fascinating discussion.

> Google has transformed from a comprehensive search engine into something more akin to an exclusive catalog.

That alone goes a long way to explain why Google search had become worthless to me. I had thought that it was mostly that their attempts at interpreting "what I really want" are terrible, but perhaps the reason is actually that they don't index what I really want in the first place.

I almost never want brand/big name sites and the like, but that is mostly what I get.

linearrust · 2024-07-15T21:33:34 1721079214

> I almost never want brand/big name sites and the like, but that is mostly what I get.

Google is doing the same thing with youtube. Youtube was a place to get away from traditional media, cable tv, etc. That was the point of the "You" in youtube after all. But now, no matter what I search, it's mostly corporate/traditional media in the list. Youtube was a major part of the 'cut the cord' movement. Now youtube is rebranding itself as 'cable tv re-imagined'.

It's amazing how google and youtube did a complete 180 in just a few years. Google search and youtube are nothing like what it was 10 or 15 years ago.

blitzar · 2024-07-16T08:08:22 1721117302

> But now, no matter what I search, it's mostly corporate/traditional media in the list.

I wish - but now, no matter what I search, it's all idiot influencers youtube shock face thumbnails and contant that can only be targetted at retards.

Not being logged in, deleting cookies and rotating IP addresses may be wonderful for privacy, but the reccomendation engine shows you what the masses are watching and it is incredibly sad.

Gud · 2024-07-16T09:29:27 1721122167

Yes, and the idiot influencers are propagating the mass consumerist agenda for the corporations.

effie · 2024-07-23T00:06:10 1721693170

I observe this as well, however, because Google wants to catch your attention and keep you for as long as possible, it is quite possible that this is also because of the past view history or terms you search. That is, if in the past, you let play lots of videos from traditional media, or search things traditional media like to talk about, then this may reinforce their appering in the search results.

If you watch videos on space flights and search related terms all the time, I expect you'll get related recommendations and links on less mainstream channels.

vincent_s · 2024-07-15T21:23:21 1721078601

You're correct about Google's poor interpretation of "what you really want." This issue worsened with AI-driven query interpretation like RankBrain.

But the main problem now is that the content you're looking for just might not be in Google's index at all.

An additional data point: 10 years ago, Google would provide up to 1000 results per query. Now, it's usually no more than 100. They've even removed the total result count from search pages.

observationist · 2024-07-15T21:31:50 1721079110

Google thinks what you really want is to give them all your attention and money in return for the privilege of breathing on the same planet as them.

vincent_s · 2024-07-15T21:48:48 1721080128

This is just how businesses evolve. Early Google was driven by founder vision and engineering excellence. They wanted to build something better than everything else.

Now, those visionaries are gone. Google is run by managers focused on efficiency. Why maintain a massive, costly index when a smaller, cheaper one generates just as much revenue?

blitzar · 2024-07-16T08:10:13 1721117413

> Early Google was driven by founder vision, those visionaries are gone

Their visions are gone but they are still there - they just have planes to fly around in, yachts to sail on and interns to sleep with instead of "doing no evil"

1vuio0pswjnm7 · 2024-07-16T06:21:23 1721110883

It can get so bad that one is only searching a database of ads. Where the "content" is poorly disguised advertisement. I have found various "search engines" like this over the years. They tend to disappear after a short while.

It always makes me wonder: Who the heck uses these "search engines"?

Their existence suggests there is some completely captive audience to target, e.g., where the audience only knows one source for searching.

An audience that is not going to "reject" the results let alone evaluate them. Even if the results are poorly disguised advertisements. Because this audience is apparently incapable of locating another "search engine".

Perhaps this is also what happens when competition is stifled through anticompetitive practices. People never come to know what they are missing.

alyandon · 2024-07-15T21:12:45 1721077965

Yeah, Google has really tanked in its usefulness to me the past few years. I now routinely have to slap site:reddit.com on many of my searches to get relevant results.

What a joke. :-/

dannyphantom · 2024-07-15T21:53:03 1721080383

I really like seeking out collecting and curating Old Hollywood content, the kind that mostly lives on old niche blogs. Google is just not great at pulling them up to a point where if I am seeking those niche blogs, Yandex is my go-to.

I find much higher quality content on Yandex for those niche topics.

silvaring · 2024-07-16T07:27:18 1721114838

Are you talking about movie reviews or deeper research / cast interviews etc? If so how would you find that stuff on yandex, what search operators would you have to use?

crazygringo · 2024-07-15T21:54:57 1721080497

This post seems to be based entirely on personal anecdotal experience.

There isn't a shred of hard data to support the headline claim that Google now "defaults to not indexing content".

Google never indexed everything, removing duplicates, blogspam, useless pages, etc. Maybe they've changed their thresholds or maybe not. But this post provides zero evidence of anything. It's pure speculation without any facts at all.

vincent_s · 2024-07-15T22:03:35 1721081015

You're right, this post is based on personal anecdotal experience. I have access to Google Search Console data for over 100 websites, and most have many pages in the "Discovered - currently not indexed" and "Crawled - currently not indexed" categories, despite ranking well for some keywords and getting traffic. This wasn't the case 10 years ago.

Regarding "Google never indexed everything" - I'd say it came close. They did manual de-indexing for heavy spam sites and would even send an email when they did this. Apart from that, nearly everything was in the index, including duplicates. De-duplication happened at the ranking stage, not the indexing stage.

At some point, Google even had a second index, the Google Supplemental Index, for pages of lower importance.

crazygringo · 2024-07-15T22:32:26 1721082746

But you haven't described what kind of websites they are, or what kind of content.

You haven't provided even any comparison screenshots from over the years, much less graphs showing that this is an actual quantitative trend, as opposed to you just noticing different things now and maybe not remembering things entirely accurately.

I'm not saying that you're not telling the truth about your observations. But I am saying that you provide absolutely no basis for making broad sweeping generalizations about Google Search changing its "defaults".

Maybe you work in a category of low-quality content, while people in other categories have seen more of their pages indexed.

The point is, your personal qualitative observations from a miniscule subset of 100 sites is nowhere near sufficient to make a provocative headline claim such as "Google Now Defaults to Not Indexing Your Content".

hackerbeat · 2024-07-15T23:22:26 1721085746

Nah, OP is right. Happens for all new sites these days, regardless of the niche.

altdataseller · 2024-07-16T05:57:09 1721109429

[flagged]

hackerbeat · 2024-07-16T06:22:32 1721110952

[flagged]

altdataseller · 2024-07-16T07:50:58 1721116258

You’re the one making a nonchalant claim “nah op is right” unbacked by evidence, not me

hackerbeat · 2024-07-16T08:47:34 1721119654

Like you run around with evidence all day to back up your comments and statements. Ever heard of experience? OP and I have been through this for years, no need for a PowerPoint for everyone who asks.

Also, you're just defending Google here for no reason, while they're breaking the web. Please continue if that makes you feel better and smarter.

altdataseller · 2024-07-16T09:40:36 1721122836

I’ve been doing search engine optimization, and trying to rank websites since Altavista, Google Panda, Penguin, exact match domain spam, etc since early 2000s and almost all pages of sites I’ve created have been indexed. So we can fight anecdata vs anecdata all day.

And i’m not even defending Google. They’re awful especially their recent decision to sell their domain biz to Squarespace. I hate Google as much as you. But whether Google is good or bad has nothing to do with this. Thats an emotional strawman

ChrisArchitect · 2024-07-15T21:59:05 1721080745

Weird description from OP of domains previously being indexed hours from creation. Domains aren't 'magically' being indexed without any kind of nudge. Maybe it was registrars or associated systems letting the engine know. Maybe it was you searching for your new domain. There's always a trigger even if you can't see it directly, how else would it know about anything? And over the years indicating to the engine/spiders that a new site existed depended more and more on site owners to let them know via submissions and console setup etc. The rest of this seems to be personal anecdotes/hearsay etc as with many SEO posts. I would think everything is still being indexed, but not shown because of low relevancy, and whatever other things google is trying to do to clean up results (which is fine, not saying that isn't happening). Show me the log where no spider ever touched your pages or did and it's not showing up before jumping to these conclusions.

vincent_s · 2024-07-15T22:11:30 1721081490

You're right, it wasn't magic. For the brand new WordPress sites, I believe it was due to the default pings that came with the standard WordPress installation.

When you published a new blog post on a fresh WordPress install, WordPress would automatically send out a ping. This feature, called "Ping-o-Matic" is still present in every WordPress installation. You can read more about it here:

https://developer.wordpress.org/advanced-administration/word...

Google likely picked up these pings and quickly indexed the new content. It wasn't an official process, but it worked reliably. The system is still in place, but Google doesn't seem to care anymore.

nostromo · 2024-07-15T21:28:44 1721078924

Google Search hasn't felt less relevant to me ever than it does now. And I first stopped using Yahoo and switched to them some 20+ years ago.

It'll find anything except what I'm trying to find. Quotes are useless. The content itself is often garbage. It works ok on common queries, but that's not when I need it to work - I need it to work the most when the query is hard. The long tail is the only thing that matters when a user is judging a search engine's quality.

The web itself has gotten worse over time, but that's also partly Google's doing. Google extracted all the value out of the open web and kept it for themselves. Meanwhile online publishers of all varieties are dying, despite being the ones producing much of the value. Google should have identified this as a strategic threat decades ago.

Now I just keep a tab open to ChatGPT all day and use it as a search engine without all the trouble of dealing with webpages.

Arnavion · 2024-07-15T21:59:42 1721080782

Downside of ChatGPT is that it makes stuff up, of course, even when you ask it for URLs.

A few days ago I was going to send a patch to Alpine Linux, and I remembered they had a document for how they want commit messages to be authored but couldn't remember the URL.

Google search results for "alpine linux commit message style doc" only returned garbage in the first ~50 results - pages from Alpine's wiki unrelated to commit message style, pages from other projects like the Linux kernel about their commit style, and batshit results like https://docs.ansible.com/ansible/latest/collections/communit... (happens to contain the word "commit").

So I asked ChatGPT "What is the URL of Alpine Linux's style guide for commit messages?" and it confidently told me it's https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/C... , which does not exist. I replied "That is a 404" and it responded "I apologize for the inconvenience. Alpine Linux's style guide for commit messages may have been moved or updated since my last training data. To find the most current style guide for Alpine Linux's commit messages, you can visit their repository directly and look for the CONTRIBUTING.md file or [go read their documentation and figure it out yourself, paraphrased]". The first one is a lie - a file named CONTRIBUTING.md has never existed in the master branch of that repo, as git easily reveals. The second one is schizophrenic - it's telling me to read the same file I just told it doesn't exist. The third one is unhelpful.

I had to find it myself. (It's https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/C... )

viraptor · 2024-07-15T23:15:37 1721085337

You can get the sweet spot with https://perplexity.ai/ for many cases. It does the searches, aggregated answer, and the actual supporting links.

It got back with "The URL for Alpine Linux's style guide for commit messages can be found in the README.md file of the aports repository on GitLab. The specific URL is: https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/R..."

(+ extra links that include the one from the end of this document, which is what you're after)

It could do better by returning the actual link rather than a page where you can find it, but it's still better then Google. (Kagi pointed at the same readme at least)

Yawrehto · 2024-07-15T22:21:49 1721082109

And it doesn't cite sources, and when it does it makes stuff up. It's even worse than Google; at least with Google I know where the information is coming from.

skeledrew · 2024-07-15T22:34:22 1721082862

I only use ChatGPT for things which don't really need a source, such a code snippets. Either a snippet works or it doesn't. For anything where I think an LLM might mess up without me realizing, I go for Perplexity, so I can then check the sources to confirm.

karencarits · 2024-07-16T15:26:47 1721143607

Thanks for providing an example of the search query you had problems with. Interestingly, I haven't noticed the decay in Google search quality that many are complaining about. The first hit when I use your Google query is https://github.com/alpinelinux/aports/blob/master/COMMITSTYL... which seems correct?

Arnavion · 2024-07-16T16:15:31 1721146531

Not for me it isn't.

In fact now the first result for me is my very comment above, but the actual COMMITSTYLE.md is still not in the first 50 results. The only result from even the right domain in the first 50 results is https://gitlab.alpinelinux.org/alpine/docs/user-handbook (at #12, and of course irrelevant).

remram · 2024-07-16T03:38:42 1721101122

So it made up BOTH those answers. There was never a CONTRIBUTING.md, and COMMITSTYLE.md was added in 2020.

https://gitlab.alpinelinux.org/alpine/aports/-/commit/0b7097...

ajkjk · 2024-07-15T21:49:36 1721080176

Yeah it's really horrible. I can't imagine what ridiculous metrics they're using internally to justify the changes. Well I'm sure some of them are "optimize the algorithm for profits"... but surely the fact that people are giving up on Google entirely as a result shows up in the data also? Or maybe it's just that the arms race against SEO has been temporarily lost.

I can't remember the last time a google search gave a useful result, outside of appending "reddit" to the answer so that the result is (probably) written by a human instead of bs blogspam. And I'm sure that won't work for long either.

sfmike · 2024-07-16T03:35:42 1721100942

I'm loving perplexity over chatgpt am I crazy but I find it so fun to use for web search tasks/queries

djha-skin · 2024-07-15T21:42:15 1721079735

My recent experience doesn't match the experience described by the OP.

I recently logged into google and asked that they index my domain (djhaskin.com). They asked me to put a TXT record in there proving I owned it and I did so. Then their "website console" thing showed that my website was indexed[1]. They have a console for this stuff now[2]. They recently showed me a page in there where that displayed some URLs that weren't indexed and which were. I requested a re-index of one of the non-indexed URLs, but the others were just broken/junk/RSS feed urls, so it was fine that they weren't indexed. The console gave me a ton of tools for making sure my site was indexed, and told me why if it wasn't.

I had plenty of tools to get my site indexed and felt like I was in control. I don't feel any sense of mystery about what is happening and I receive notifications when indexing fails.

1: https://1drv.ms/i/s!AoAOaR6dYP8RgcEAmUu_3ZrInrzbLw

2: https://search.google.com/search-console

vincent_s · 2024-07-15T21:52:19 1721080339

That's great, you've done the right thing. As your site grows, keep an eye on the "Crawled - currently not indexed" row.

candiddevmike · 2024-07-15T22:29:07 1721082547

Im seeing this across so many pages. They're unique--docs and about pages for products I built--and Google refuses to index them.

dotancohen · 2024-07-24T09:09:22 1721812162

Do you have clear links to those pages from indexed pages? Do you have a site map, or are you relying on Google indexing everything from the menu and footers?

hackerbeat · 2024-07-15T23:25:31 1721085931

The console has been around forever and getting a few sites “indexed” means nothing these days. Google will show these to a handful people a day to make you believe they care.

SoftTalker · 2024-07-15T21:11:20 1721077880

We're back to the web of 1996, where search sites such as Yahoo were manually curated.

aiauthoritydev · 2024-07-15T21:41:51 1721079711

Future of internet is going to be AI driven content shown in AI driven adaptable UI. Very often this content would be shown directly on Google properties such as GMAIL. Of course, Google needs to figure a way out to pay people for this which I am sure they eventually will.

kjkjadksj · 2024-07-15T22:36:15 1721082975

Eventually we will just have our ai agents reading the content while we sit in a pillow lined room doing the real important work

barnabee · 2024-07-16T02:36:54 1721097414

That's ok, I'm waay ahead of them — I've been defaulting to not visiting Google for years now.

londons_explore · 2024-07-16T02:46:50 1721098010

If I were to take a guess, the reason you're seeing URL's not being indexed is Google only has finite indexing resources, yet AI content generators can generate a near infinite amount of content.

That makes your real content a smaller proportion of the whole web, and therefore less likely to meet the threshold for fitting inside googles finite indexing budget.

summerlight · 2024-07-16T01:54:20 1721094860

IMO, this doesn't seem a coordinated strategical movement (if it was so, they should've done it much better than this) but more of computational resource saving. You'll be surprised by size growth of the entire web and its degradation on signal to noise ratio. My gut says more than 99% of incremental web pages are filled by some auto generated craps. The problem is there's no great economical way yet to figure out which is garbage and which is a genuine content. They should develop a better, scalable technologies for that (and it's fair to say that they should've focused more on this), but LLM is still too expensive to run and vulnerable to lots of attack vectors.

AlienRobot · 2024-07-16T02:53:11 1721098391

The real problem imho is that Google's and other search engines' design are fundamentally flawed and can't be fixed.

They want a single box that can do everything because they don't think an user can be given a minimal amount of training to use a search engine, and they want all results ranked in a single, linear list similarly.

It's becoming increasingly clear that when you do that, all long tail results turn into garbage.

Google has become essentially a phonebook. If you type the name of a thing, Google can quickly find its official website. Anything more complex than that and it quickly derails.

A good example is reviews. Google literally can't find reviews if you type "reviews" in the query, because it can only find webpages that strictly contain the terms you typed. This means that for a review of something to be found, the writer would have to literally write "reviews" in their webpage, or Google would have to interpret the query non-literally and heuristically categorize webpages as reviews, which is error-prone, because then you'll have lots of webpages that aren't reviews appearing when you search for reviews and the word "review" you typed is apparently ignored completely.

The whole authority concept also feels fundamentally flawed.

If the most brilliant researcher in a field has a blog that is a gold mine of information, that blog is never going to appear in any search result because this person would be too busy doing their research to waste their valuable time building backlinks to rank.

There are too many cases where "ranking" will go wrong. It's wrong to assume Google can deliver the best result as the first result, as it has no way to make value judgements on the content of articles, but this assumption apparently drives Google itself to implement policies to hurt its own results. They seem to only care about the first result. They don't want users to browse a whole list of results. I believe "browsing" is the key word here. If it's not possible to browse, it's not possible to find it yourself, and you depend entirely on a machine that is a black box to do the finding for you. When the machine fails, there is no recourse.

davidgerard · 2024-07-17T20:05:47 1721246747

> If you type the name of a thing, Google can quickly find its official website.

with five ads before it pretending to be its official website

breck · 2024-07-15T21:51:29 1721080289

> "to organize the world's information and make it universally accessible."

Their website actually still says this: https://www.google.com/search/howsearchworks/our-approach/.

It seems from their recent actions their mission is to "organize the world's information and drip it out as slowly as possible, covered in ads".

If you have had it with (c) and paywals and ads, come join the revolution which is the World Wide Scroll: https://wws.scroll.pub/

xemdetia · 2024-07-15T21:47:16 1721080036

I just wonder when the dam is going to break. Part of how Google spun money for so long is because they had high quality search to run ads on, but some of these latest changes have completely lost the plot. I don't need an animated ai response to my basic question: I want an answer. Especially when a bit ago on a different device I asked the same question and got the answer the first time. I haven't been so inclined to entertain what if I just did it myself thoughts for search in at least 15 years but it has become very an increasingly loud thought.

marshray · 2024-07-15T22:46:54 1721083614

Sincere question: Why should we think that your experience is common?

I remember the olden days when Google was hailed for its brilliance in A/B testing the subtle shades of colors it used on its home page.

Why should we think they aren't producing the product that most people want?

pests · 2024-07-16T02:51:31 1721098291

Just because it's popular doesn't mean it's what people want.

Most people were hosting on AngelFire or Groceries back in the day, but as we see today that's not what people wanted.

sfmike · 2024-07-16T03:35:05 1721100905

They've probably realized that those who will make content that makes them more money with ads are the same willing to go through hoops to use search console to be indexed. Cheaper costs of indexing and auto sorting to those that are more monetizable in nature. Also no toil. For them its a multi front win.

aiauthoritydev · 2024-07-15T21:37:55 1721079475

I think the problem is not Google specific rather the internet has grown far too large with too much of crap floating around. Google, in my opinion has done the best job of getting relevant information followed by Reddit.

While OpenAI etc. is pretty good (so does Google Gemini) what is OpenAI like interfaces prevent me from doing is to segue from a focused topic to related areas to discover knowledge on the periphery, which is the most important aspect of learning in my opinion which chatbots today are not able to do that well.

HN historically has lot of G haters. Which is fine, but I feel a lot of criticism is not really reasonable.

vincent_s · 2024-07-15T21:43:57 1721079837

Yes, there's undoubtedly too much low-quality content on the internet. But my post isn't mere criticism of Google. It's an attempt to explain and understand how Google's approach has changed over time.

In the past, Google would index nearly everything they could find, then filter content through ranking algorithms. They would downrank what they considered junk, but with enough query refinement, you could still find it. This approach also allowed you to find sites that Google might have incorrectly categorized as low-quality but were actually what you were looking for.

Now, the situation is different. Google isn't just doing its filtering at the ranking stage, but at the indexing stage. This means you can refine your query endlessly, but you'll never find content that isn't in the index to begin with.

aiauthoritydev · 2024-07-15T21:50:16 1721080216

Thanks for the clarification.

I doubt if this is a new thing though and both options are not mutually exclusive. Google probably now has enough signals to detect a vast majority of pages to be not worthy of even crawling. I would be surprised if they figured this out in 2024 and not in 2014.

Chances are these pages are absolute garbage.

ipaddr · 2024-07-15T23:07:33 1721084853

It's more like your personal blog doesn't get included and is replaced with tiktok, instagram and others. Your blog isn't absolute garbage anymore than some random tiktok video. You are likely to find more garbage now because content comes from large entities.

aiauthoritydev · 2024-07-16T06:01:41 1721109701

That is your assumption and grossly oversimplification. But I do see the point of not indexing vast majority of internet traffic that is low value.

PS. Google does not benefit at all promoting Instagram or Tiktok. They benefit from simpler an independent content lot more. Clearly it would be other way around. I have seen Pinterest simply disappear from Google recently to give you an example.

JohnFen · 2024-07-15T22:04:35 1721081075

> Which is fine, but I feel a lot of criticism is not really reasonable.

My criticism of Google search is unrelated to my feelings about Google itself. My criticism of Google search is that it gives me truly awful results from my queries.

That's objectively true, as demonstrated by the fact that other search engines (even ones that use Google in the background) are better at that.

xnx · 2024-07-15T21:29:31 1721078971

Lots of claims without any specific examples or hard evidence. Typical SEO blogpost.

vincent_s · 2024-07-15T21:38:34 1721079514

You're right, there's no hard evidence. Google keeps its secrets well, making it difficult to prove these claims definitively. I'm speaking from experience, but there is a way to observe this phenomenon yourself if you have a website:

1. Register on Google Search Console

2. Go to the page indexing section

3. Look at the rows "Discovered - currently not indexed" and "Crawled - currently not indexed"

For example, my own site has a two-digit number of URLs in both categories. These are blog posts Google simply doesn't want to index for reasons unknown.

I have access to Google Search Console data for over 100 websites, and most/all of them have the same issues. This includes sites (like my own) that rank well for certain keywords and receive traffic.

bmar · 2024-07-16T02:20:20 1721096420

Just checked my site, and the "Crawled - currently not indexed" pages are just garbage pages, like tags, categories, query params, etc.

tonymet · 2024-07-15T21:23:33 1721078613

what are the better search engines out there featuring quality content? I don't mean privacy focused ones like Brave, Duck Duck Go . How about indexes that have useful, thoughtful, creative content.

JohnFen · 2024-07-15T21:32:13 1721079133

I've been having a great experience with Kagi, personally. Especially once I'd upranked and downranked various sources according to my needs.

Upranking/downranking sources turns out to be more powerful and meaningful than I thought at first. At first, I didn't even bother with it, but it turns out that being able to provide a "correction factor" signal makes a serious difference.

ipaddr · 2024-07-15T23:03:13 1721084593

What happens when your needs change. You down rank say youtube but later you need to look for a specific video. Multiple that by a number of sites. Do you have different profiles? This one is mainstream faag, this one is blogs?

JohnFen · 2024-07-16T03:04:56 1721099096

As everforward says, lenses are a way of managing that. Downranking a site doesn't mean you'll never see results from that site (unless you want it to), it means that the site will score lower in the results than it would otherwise. And you can change the rankings anytime, if you wish.

I don't use the lenses myself because I don't need different rankings for different things, but they might address your needs.

everforward · 2024-07-16T01:57:13 1721095033

They’re called Lenses, but yeah, basically. Each Lens has its own up/downrankings per site.

vincent_s · 2024-07-15T21:27:38 1721078858

There's Marginalia [0] and Kagi Small Web [1] but you could also try Bing. It's far from perfect but it has more sites indexed than Google.

[0] https://search.marginalia.nu/ [1] https://blog.kagi.com/small-web

qudat · 2024-07-16T00:51:40 1721091100

Here’s what’s happening and everyone is starting to realize it: the search engine is getting reduced in scope to a bookmark service and local search.

Want an answer to a question or to learn more about a topic? Use an answer engine like perplexity.

My need to go to a search engine is dropping quite a lot now because of LLMs.

I’m using Kagi as well which is nice.

AlienRobot · 2024-07-16T01:55:59 1721094959

Wiby is another.

sakisv · 2024-07-15T22:29:41 1721082581

I know that Google has extremely talented people tackling these kind of things, and with vastly more experience than I, but I can't help to think that the problem is that we as users have extremely diverse needs and use cases, which is impossible to satisfy them all.

It's even more impossible to satisfy them in a way that's also useful for Google's own business model.

The most obvious and potentially naive way of doing that would be to allow end users to upvote or downvote search results. I know that Google already does that in an automated way to some extend, but the problem is that that signal is then used to determine the quality across all the users which, as I mentioned in the beginning, is impossible to get right.

Instead that metric should only affect each user's own search results and not everyone else's. This could improve the quality of the results, bring more people back, and eventually increase the revenue. It would also help prevent "gaming" the system.

What am I missing here? I can't believe that they're not aware of that. I also can't believe that they don't want to fix it or that they're so focused on ai that internal politics don't allow them to do anything else. So what gives?

lofaszvanitt · 2024-07-15T22:12:18 1721081538

I recently launched a site that sports a lot of images. Well it's an image site. You go there to view images on specific topics. The google search console always complains that there is very few text on the page and it only indexed 1/6th of the image pages (it doesn't even fetch the images itself... who knows whether it's a good or bad sign ;|). Go figure, this is an image site.

Should I start to write text next to each image, like:

"Mech approaches another dark, very evil looking mech on a bright day and swings its laser sword to decapitate the evil one."

Gimme a break. For 5K+ images... :D. The topic, title and description is not enough, it needs more text to believe that this is what the images are about. No, your images will not be indexed, will not be included in the image search results, because you are not part of the exclusive club. No monopoly here.

Bing webmaster tools tells me that there are no highly ranked sites that link to my content, and there should be :D. I just started the site, how would there be any linkage to it?! Are they insinuating that I should create fake sites to promote my content or maybe pay for seo? No monopoly here either.

Yandex... I can't really figure out whether their indexing works or not, it sometimes complains, then doesn't do anything for weeks. Then it comes up with another made up problem that is nonexistent. It acts like a drunkard.

I haven't tried Baidu, because they need some local phone number and they clearly can't send activation SMS to Europe.

Next I'll try with a news site and write a blog post about my experiences. Truly interesting times.

jsnell · 2024-07-16T05:04:20 1721106260

So, I'm kind of confused about what you expect to happen.

You've got a pile of images with no descriptive text. What kind of query do you think would make a search engine include one of those pages in the result pages? They need something to work with.

At most it sounds like it could be people using your site name in the query. But that only happens when you have users, you can't bootstrap your popularity like that.

And if there is no set of reasonable queries where your pages are likely results, what is the point of of indexing them?

iggldiggl · 2024-07-18T12:41:27 1721306487

> You've got a pile of images with no descriptive text. What kind of query do you think would make a search engine include one of those pages in the result pages? They need something to work with.

Although Google these days definitely runs image recognition and OCR while indexing images for its image search.

lofaszvanitt · 2024-07-16T11:32:28 1721129548

There is title, descriptions, tags... If you flow it around with generated text, because that's what people are doing to game the system, it will be more reliable or what did you miss? :D

kstrauser · 2024-07-15T21:06:09 1721077569

> Google has transformed from a comprehensive search engine into something more akin to an exclusive catalog.

That's the only plausible long-term path to keep its search results competitive and relevant.

> For content creators, it presents a significant challenge: how do you gain visibility if Google refuses to index most of your content?

Don't publish junk that it doesn't find interesting under the assumption that it "owes" you a front page search result.

I'm not a Google fanboy. I have a paid Kagi account and I use it nearly exclusively. I want Google to stay competitive though. There's a vast army of SEO spammers who think they know the one magic invocation that will drive traffic into their willing arms, or, at least, are able to convince paying customers of it. If Google could wave a magic wand that could accurately identify all of the junk that exists purely to increase results rankings, and they used that knowledge to permanently remove it from the results they show me, well, I'd probably stop paying Kagi to do that.

vincent_s · 2024-07-15T21:13:27 1721078007

This is a common initial reaction, but it misses an important point: Google's process for deciding what to index isn't perfect. There's no perfect solution. Many valuable, high-quality pieces of content that people would find useful never make it into Google's index.

In the past, users could refine their search keywords until they found what they wanted. This approach doesn't work as well now. The main reason? The content you're looking for might not be in the index at all. Google's selective indexing aims to reduce spam, but it also limits the diversity and depth of discoverable information.

emurlin · 2024-07-16T02:10:47 1721095847

  > Many valuable, high-quality pieces of content that people would find useful never make it into Google's index.

I see your point in the light of the article (not indexed = not visible), but it feels like the things that _do_ make it need to follow very particular content and style patterns to rank high.

Anecdotally, this observation comes from searching for any term and seeing the results: they are usually similar-looking plausible-looking-but-actually-low-quality results that seem to follow the same or similar structure and have the same content. This does indeed limit the diversity and depth of information, but I'm not so sure it reduces spam, as these low-quality sites seem to be as prevalent as ever before, if not more.

From experience writing articles to a small tech blog, this means that it's quite difficult to get well-researched articles to rank well, even if they're indexed.

For example, I've written an article on how to block hotlinking (I've just checked, and Google says it's indexed). If you search for this, my article on a not-so-well-known blog is nowhere to be found(*)(**), and this is somewhat expected, for a myriad of reasons. The problem isn't that my post doesn't rank, but rather that none of the top-ranking (or even not-so-top-ranking) results are wrong. They are either about how to do this on cPanel or whatever, which is ineffective (but granted, could be what people are looking for), or instructions using the `Referer` header, which is ineffective.

These days, browsers offer headers like `Cross-Origin-Resource-Policy` which can completely solve the particular issue of hotlinking, unlike `Referer` which is easily bypassed using `referrerpolicy="no-referrer"`. However, because most 'authorities' seem to be wrong on this issue, the correct result isn't displayed, because it's a hard problem to solve algorithmically (or even manually).

(*) This doesn't affect just Google, though.

(**) Because it's indexed, adding the right keywords (which you wouldn't do in this case unless you already knew the answer) does bring it up, although from federated high-authority sites instead of the original canonical source.

kstrauser · 2024-07-15T21:30:34 1721079034

I understand what you're saying, and while I'm sympathetic to that position, you've gotta lay a lot of that at the feet of spammers who try every trick they can think of to fill the results with crap. The same thing happened with email. Maybe I want my doctor to send me information about, I don't know, herbal cholesterol treatments, but she can't because they get caught in a spam filter.

Countless terabytes of junk web content make it hard to find the needle in the haystack. Frankly, I don't want to see that low-value part of the web. Every time it shows up in search results, that's one less decent result appearing on my screen. I suspect that anyone who says they want to see everything doesn't really appreciate what everything means. There will never be a page on Experts Exchange that I want to see. I'll never be grateful for a result from answers.com. Although those sites are capable of creating useful information, their business model been to dump a ton of junk on search engines with the hope that people accidentally click on it so they can make some ad money. I picked on them because I can name them off the top of my head. There are a million sites just like them. And none of them have anything to offer me whatsoever.

I'd hate for Google's changes to remove the site maintained by that expert kid in Tampa who has precisely the answer I need, along with part numbers and wiring diagrams. If Google has to nuke the Experts Exchanges and answers.com of the world so that the Tampa kid's site does start showing up, yay!

rglover · 2024-07-15T21:03:59 1721077439

How Google can fix their search: embedded ranking.

Give me an embeddable iframe that I can optionally add to my site which allows visitors to give feedback (think a floating Reddit upvote button). Require that button's access to require an authenticated user via Google.

Ranking in the algorithm is weighted such that the organic user votes are the heaviest, and content length, keywords, etc., are the lowest.

Remove all the AI crap (or tweak it so I can chat with a bot to improve my search, but it's not Foie Gras'd down my throat). Make ads a free vs paid experience (want to avoid ad results, pay Google $5/mo for a clean result set).

This would make the only way to "game" SEO authentic, quality content. In essence, it's taking the current hack (appending "Reddit" to the end of a search query) and building it into the core experience.

crowcroft · 2024-07-15T21:12:03 1721077923

99% of websites won't implement, promote, or get engagement with this universal button.

And so then the way to rank is to get in a group chat with all your SEO buddies and trade ~~backlinks~~ upvotes with each other. Or make bot accounts that appear real.

Even if this idea worked, the idea of giving Google even more consolidated power/knowledge over the open web is horrifying to me.

rglover · 2024-07-15T21:17:56 1721078276

So those sites will be ranked accordingly.

Upvote farming could be mitigated by vote frequency. That sort of behavior would be obvious in the data as you'd see significant voting activity over some time range (which wouldn't look like organic search behavior).

You can't outright avoid the consolidation of power with search engines, unless (and I think this would be smart for someone to do), you create a decentralized version of what I described that's part of an alternative search engine.

cjblomqvist · 2024-07-15T21:33:52 1721079232

Google has been playing this game for 20+ years and failed hard the last 10+. You think you could easily solve it?

I'm not saying they're applying the best of ideas and doing it well, but I'd wager none with good experience in this field would bet it would be that easy. I think it's more akin to hackers and defenders, each surpassing the other one in a never ending race.

rglover · 2024-07-15T21:44:56 1721079896

> Google has been playing this game for 20+ years and failed hard the last 10+. You think you could easily solve it?

Yes. Google hasn't really produced anything of substance post-2010. They fell into the typical corporate bureaucracy trap. The rejection of ideas like mine are likely even more common internally in Google. In an org of that size/stature, people jockey for position, they don't create (and that's not an opinion, it's objective reality).

O.G. Goolge would try anything (I remember they offered radio station automation software briefly).

Could I be wrong? Sure. But if public opinion of my core product is waning massively, I'd rather trust the mad scientist than the Wharton dork who's focused on his L-status. Worst case scenario the failure is a rounding error on my balance sheet.

crowcroft · 2024-07-16T13:38:29 1721137109

Maybe, although I think your understating the complexity that filtering through data at Google scale can involve. When you have enough users in a system many many 'normal' people will look and behave like bots/fraud, things that should be obvious are often not.

I wonder if Stumbleupon could have achieved something similar to what you're proposing if they had tried to build a search engine.

esafak · 2024-07-15T21:10:04 1721077804

Organic ratings are only valuable to the extent that they are not manipulated. At Google's scale, you attract click farms: https://www.fraud0.com/wp-content/uploads/2023/02/Smartphone...

rglover · 2024-07-15T21:14:05 1721078045

This could potentially be mitigated by observing vote activity from IP ranges and frequency. Throttle vote activity for a range based on frequency over X window of time. Banning accounts could help, too, as well as more stringent account verification (e.g., want to enable voting on this account, give us a phone number or other form of 2FA).

It could get murky, but you could even have votes weighted by an account score (not quite sure how that would take shape).

wruza · 2024-07-15T21:44:50 1721079890

They’ll give you a phone number then and enter totp and solve captcha. And click as organically and humanly as google is concerned, with sleep schedules and days off, with click hesitation and cursor movement profiles. It’s not that big of a deal, all instruments are available.

jfengel · 2024-07-15T22:16:26 1721081786

Spammers are really good at using other people's computers and accounts.

rglover · 2024-07-15T22:26:02 1721082362

If anybody has/can solve that problem, it's Google. I think their deficit is vision, not implementation.

aiauthoritydev · 2024-07-15T21:40:46 1721079646

Google has far better signals than that. Organic cross traffic, time people spending on the page etc. serves the same purpose and that info is available for nearly free. Adding voting mechanism makes the whole business too expensive. (Imagine maintaining a complex, frequently mutable table column for every page on internet).

rglover · 2024-07-15T21:49:13 1721080153

No different than their analytics approach. The only difference is manual vs. automatic updates (and those updates can be queued as the computed result doesn't need to update immediately).

eastbound · 2024-07-15T21:08:49 1721077729

On other words, now that the big players have taken most of the easy fruits, build the long-term long trail end of useful constructions where there isn’t much money to work with.

bastawhiz · 2024-07-15T21:08:27 1721077707

I don't think you can do this as part of the page. You can make the iframe almost invisible and place it directly under the user's cursor at all times. And the iframe would be none the wiser: as soon as the user clicks anything, they "vote".

Ultimately this needs to be part of the browser chrome, since you can't trust the site.

rglover · 2024-07-15T21:10:15 1721077815

I think you could solve this by using some trickery to know the position of the element in the window. Maybe an iframe isn't exactly it but something. I think you're right, something in the browser would be best (or at least an extension).

julienmarie · 2024-07-15T21:47:32 1721080052

Google weirdly does that already, in a way. It tracks when you come back quickly from a page to the search results and interpret this as a negative feedback ( not a positive one ).

JohnFen · 2024-07-15T21:06:04 1721077564

> Require that button's access to require an authenticated user via Google.

Wouldn't that mean that tons of people won't be able to "vote"?

rglover · 2024-07-15T21:11:16 1721077876

Hypothetically. But that isn't necessarily a bad thing. I'd rather have things rank based on a limited sample size than none at all.