Have to be honest, I'm surprised the article wasn't about SEO. That gets a lot o...

zozbot234 · on March 31, 2020

SEO is behind many of these dynamics, though. The "more accurate"/"better quality" signal is getting so noisy that rewarding freshness and hoping the user meant to search a very current topic is perhaps the best you can do. Quite disappointing of course (since we'd rather have good-quality content be easily reachable) but not entirely unexpected.

CobrastanJorji · on March 31, 2020

If you have a good suggestion on how to rapidly measure site accuracy and quality I know some VCs who would very much like to chat.

Well, no, I don't. But I highly suspect that they exist and would want to chat.

basch · on April 1, 2020

It's obviously a harder problem than I want to believe it is, but considering the terrible quality of results as of late, I don't think this would be worse.

First, start with a whitelist. Hand pick high quality publications, and rank them towards the top. This may tilt results back towards institutions, and away from blogs.

Second, punish similarity. If everybody is reposting AP or Reuters without any additional information, consider them a dupe and don't list them. They can run their portals, but they don't need to show up in search.

It's come up multiple times in this thread, car manuals is a good example. They would be better off throwing away every result they have and hand indexing the good information, than what gets returned right now.

Recipes in particular have turned into a giant story about the way grandma used to do it with a picture followed by the same couple variants with different proportions. Pick winners by hand.

Someone has a finance question, just put boggleheads at the top, instead of whichever 59 affiliate credit card sites sprung up.

Need health advice? Put examine.com at the top above WebMD and healthline. Why? Because a human exper compared them and decided examine is a better first result. You could comb through tens of thousands of sites with a team of hundreds of people, something Google easily has at its disposal. What PageRank had, that seems to be missing now, is a seed of "we trust these most" and let the network grow from there. It tried to find expertise, instead of clickability. It was about getting you the best information first.

s5ma6n · on April 1, 2020

How would you pick the options by hand? This way you would introduce the thought and point of view bias of the people working at Google to the results. Additionally how would you pick who is the winner? For example how would you compare a technology article is more correct coming from various sources? They also have their own bias introduced to their content.

basch · on April 1, 2020

>This way you would introduce the thought and point of view bias of the people working at Google to the results.

Correct. I would do that.

.

>They also have their own bias introduced to their content.

Yes. Good.

I'm not necessarily saying they hand pick the best article for every single story. Although techmeme.com and hn do that to an extent, when they notice a better version of an article, they replace the top link with the better version.

_solr · on April 1, 2020

> If you have a good suggestion on how to rapidly measure site accuracy and quality I know some VCs who would very much like to chat.

I spent a few days thinking about it not so long ago and I have thought of something rarely mentioned. Don't get me wrong, I don't think I have completely solved the problem, just noticed it changes the perspective.

If I remember well, from my user perspective, the biggest change Google introduced was the ranking by page. Yahoo used to rank by site not by page. Maybe going back to a ranking by site would help creating a good index.

A site would be associated to a number of keywords, say 20 and that's it. That would give incentive to pick the keywords you want to rank for carefully and really be an expert about them instead of having SEO experts deciding which keywords they want to rank for this week and write empty TF-IDF optimized blog posts.

This sort of search engine would not give you the answer to everything but it would give back power to the websites. The information retrieval process would then be 2 steps :

- find a good website

- find the information within the website

zozbot234 · on March 31, 2020

> If you have a good suggestion on how to rapidly measure site accuracy and quality I know some VCs who would very much like to chat.

Bring back some variety of DMOZ, perhaps in a federated (easy to fork) version. That was quite successful at surfacing the best-quality online resources by topic, and even the early Google index seemed to rely on it quite a bit. But it wasn't a VC-funded project, of course.

anticsapp · on March 31, 2020

DMOZ, really? Yes, at the beginning, yes. But 5-6 years later I know plenty of companies and even bloggers who would locate a "volunteer" and ply him with hundreds or thousands of dollars to get them in, get free traffic, and that beast of a PageRank 7 link.

zozbot234 · on March 31, 2020

Yes, but this only ever impacted categories where for-profit links are common (and over time, people learn to disregard these links). And Google Search still does a pretty good job of searching for relevant businesses, since it's one of the main things that people use it for.

CM30 · on April 1, 2020

The issues with the DMOZ approach were basically speed and corruption:

1. It was slow to add categories/sites, which especially hurt categories where things change pretty quickly (gaming, tech and media are good examples, since new systems and frameworks need to have categories added ASAP).

2. Editors were often drawn into corruption, and either judged submissions based on how much they were paid elsewhere or prioritised their own/friends/family's websites.

Both of these issues could potentially be fixed with some more resources and better oversight, but it may mean any future DMOZ equivalent would need a lot more funding than the previous one.

zozbot234 · on April 1, 2020

> The issues with the DMOZ approach were basically speed and corruption:

Federation would help with both factors, though. A workable "right to fork" is a powerful incentive against corruption. Notably DMOZ was not federated or "forkable" in any real sense, even though it did have a reasonable amount of sites mirroring it.

thatcat · on April 1, 2020

For scientific/technical domain stuff you could:

1) look for references to source materials

2) check references quality - is reference real? does the quoted text match the text from the reference? is it an academic paper published in a journal?

3) authorship quality - what is the academic "impact factor" score for the author?

4) confirmed viewer reviews - subjective review by confirmed users

5) accessibility score - automated user interface usability analysis

lonelappde · on April 1, 2020

Why? Where is the money in providing high quality information to the general public? The public wants cheap candy.

High quality data exists, but it's not much of the ad supported web and not much of what users what to read.

TeMPOraL · on April 1, 2020

The public doesn't have much say in things, it consumes what it's given. Cheap candy is the cheapest to produce, so that's what gets delivered.

tomaszs · on March 31, 2020

I see it that way: people tried to get exposure with Google finding ways to do it. For example by exchanging links. Was it bad people Exchanged links? No. Was it used by spammers? Yes, also. Did Google decide to punish it? Yes. Who is hurt? Normal website users. Spammers found new ways. But the truth is with each Google update normal website owners was punished. And for spammers it just got a little bit more complicated to find a new hack.

Now we are at a point where normal website users have very little ways to be high in search results. And people with money can buy it either with very expensive SEO or with expensive Ads.

Sometimes i wonder why the heck should normal website user even try to please Google. As a normal website owner i dont feel Google gives me as much as it expects me to do.

Dont link to this, use AMP, God forbid to Exchange links. There are books about how to please Google. But what is the point? It all comes to who has more money. I dont, so i will never win a good position.

I think we should just forget all these Google rules because they destroyed the Internet how it used to be. Autodiscoverable.