Ask HN: Is there a search engine which excludes the world's biggest websites?

noad · on May 16, 2020

This is a great question, I also want a way to search the internet but exclude all major media domains as well as any company over a certain size. So I just want to search through old blogs, SO, non-corporate social media, weird forums, etc.

There are so many cool things I remember reading on the web like 10-20 years ago that still exist that are so buried now on Google they might as well not exist. Nowadays searching any topic seems to always lead you to CNN and Microsoft and Facebook and other huge corporations. Search results are just becoming more sanitized and beige and meaningless every day.

dorkwood · on May 16, 2020

For years, my trick for finding interesting content was to go to, say, the 7th page of Google results, and start there. This doesn't work anymore -- it's SEO-optimised listicle blog posts all the way down.

My trick now is to use Twitter to discover interesting people, and follow them there. Granted, it's not a search engine, but it's at least given me the ability to discover weird things again.

gbin · on May 17, 2020

I agree on the original issue but social media tailored to people's own bubble is probably not a good source for enlightenment either...

dorkwood · on May 17, 2020

I think it depends how you use it.

One of the things I enjoy doing on Twitter is posting up something I'm working on, and then clicking through to all the profiles of the people who like, comment, or retweet my work. I stumble across an incredibly diverse range of people by doing this, many with conflicting opinions to my own, and many who belong to strange subcultures that I don't understand, but who were all drawn to my work for one reason or another.

I think there's definitely a danger of crafting a bubble for yourself if you choose to use it that way, but as a tool for discovering people making cool stuff who otherwise wouldn't cut through the noise on something like Google search, I haven't found anything better.

tomaskafka · on May 17, 2020

Many times I have thought about what could happen if Twitter asked you to recommend up to 3-5 people you value, and write a tweet-sized (or shorted) recommendation.

Over time you would get a 'pagerank for people' and could do awesome stuff with that, like 'You don't know XYZ, but 3 people you trust trust her, and this is what they tell about her:' ...

halfdan · on May 17, 2020

Until that gets screwed using social engineering and people optimising for being affiliated with people writing them good reviews.

Terretta · on May 17, 2020

You know, for people with Klout.

https://en.wikipedia.org/wiki/Klout

edeion · on May 18, 2020

I was about to ask about dmoz.org. But apparently it's dead. We could probably do something with bookmark sharing à la Delicious. Good dead things for a better future.

Scoundreller · on May 16, 2020

Heh, I was trying to do research on coronaviruses (of which COVID-19 is one of many coronaviruses), but Google sanitized the result and only showed me "official" COVID-19 resources and buried the broader coronavirus resources.

https://www.google.com/search?q=coronavirus

stolenmerch · on May 16, 2020

COVID-19 isn't a coronavirus, it's the disease caused by the SARS-CoV-2 virus.

ganstyles · on May 16, 2020

Giving you the benefit of the doubt, and assuming this isn't just pedantry, especially since you're getting downvotes (because I assume everyone thinks this is just pedantic correction) I looked it up.

In the context of "trying to do research on coronaviruses" your comment appears to be not only correct but an important distinction, rather than the pedantry it appears to be.

From Wikipedia: "...more lethal varieties [of coronaviruses] can cause SARS, MERS, and COVID-19."

And...

"Severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2] is the strain of coronavirus..."

I learned something today!

ccheney · on May 16, 2020

I also had this epiphany awhile back and what helped me understand the difference was to think about how HIV (the virus) causes AIDS (the disease).

teamspirit · on May 16, 2020

Further, CoViD-19 literally stands for: Co[rona] Vi[rus] D[isease] - [discovered in 20]19 - "D" standing for disease caused by this particular strain.

To be honest, I was a bit disappointed when I found out, though I admit now it's a little refreshing to have be so simply named.

Theodores · on May 17, 2020

Which can be further abbreviated as C19. I have seen this in personal chats and wonder how long it will be before it gets into newspaper headlines where space is at a premium in print editions.

Wistar · on May 17, 2020

I already see it in publications but more frequently see the variant C-19.

jkoudys · on May 16, 2020

Complaining about pedantry is the new pedantry.

acituan · on May 16, 2020

I know you are relaying the public information accurately, but I wish authorities pushed better names. Like calling the virus "the virus that we know has a corona and causes these symptoms" and the disease "the disease caused by this virus that has a corona and that causes these symptoms" is circular. Also, it is not true that it is entirely a respiratory syndrome. There are serious non-respiratory symptoms, extent of which we are to discover. Finally, if it is a syndrome causing virus, by definition we wouldn't have the crisp boundaries of a disease around it, which indeed we don't.

If these were names for services and classes that came in a code review, how many would really approve?

pwdisswordfish2 · on May 16, 2020

American media calls both disease and virus "corona". They seem to care little about such details.

perl4ever · on May 17, 2020

I've been using "covid". Don't know if it will catch on, but I feel it's important for efficiency's sake to try to save one letter.

docbrown · on May 16, 2020

[retracted] and I hope this is just a misunderstanding. As the director of the World Health Organization (WHO) said, 2019-nCoV is a novel (new) coronavirus.[0] The CDC defines coronavirus as a virus that was not previously known — check the FAQ, “what is a novel coronavirus?”[0.5]

They changed the name of this coronavirus to reflect the disease more accurately to COVID-19.[1]

The CDC has a list of other coronavirus’ that have existed.[2]

0: https://twitter.com/DrTedros/status/1227297754499764230?s=20

0.5: https://www.cdc.gov/coronavirus/2019-ncov/faq.html

1: https://www.who.int/dg/speeches/detail/who-director-general-...

2: https://www.cdc.gov/coronavirus/types.html

——

Edit: Since there seems to be a misunderstanding from everybody’s part on this as it’s referred to as both and often interchangeably in a mainstream setting, take a look at John Hopkins guide: https://www.hopkinsguides.com/hopkins/view/Johns_Hopkins_ABX...

okintheory · on May 16, 2020

No, the comment above is correct, and your own links show this.

From link [2]: "SARS-CoV-2 (the novel coronavirus that causes coronavirus disease 2019, or COVID-19)"

The previous comment was just making the point that the (new) virus is called SARS-CoV-2 and the associated disease is called COVID-19.

COVIDisntCORONA · on May 16, 2020

Excuse the incivility, but no.

COVID-19: disease caused by SARS-CoV-2

SARS-CoV-2: strain of SARS-CoV

SARS-CoV: severe accute respiratory syndrome coronavirus

Coronavirus: virus that causes respiratory diseases in mammals, such as SARS (SARS-CoV) MERS (MERS-CoV), and COVID-19 (SARS-CoV-2)

tacon · on May 16, 2020

>SARS-CoV-2: strain of SARS-CoV

Excuse the incivility, but no. SARS-CoV-2 is not a strain or type of SARS-CoV. The viruses share ancestors, but SARS-CoV-2 did not come directly from SARS-CoV. SARS-CoV and SARS-CoV-2 are in the category of beta coronaviruses[0].

"The whole genome-based phylogenetic analysis presented that two Bat SARS-like CoVs (ZXC21 and ZC45) were the closest relatives of SARS-CoV-2."[1]

[0] https://www.mdpi.com/2076-0817/9/3/240/htm

[1] https://www.mdpi.com/pathogens/pathogens-09-00240/article_de...

COVIDisntCORONA · on May 16, 2020

Excuse the incivility once again, but no.

While we're on the topic of linguistic pedantary, strain isn't exclusive to direct mutations from a parent genome. Strains, like much of biological taxonomy, are a human abstraction to make communication of the idea of -- in this case -- "a virus sharing similar properties to coronaviruses that cause severe acute respiratory syndrome" -- albeit this is a very simplified definition for the sake of brevity.

SARS is caused by SARS-CoV-1 and COVID-19 is caused by SARS-CoV-2.

Rather, if we would like to be absolutely correct about these classifications, we would say SARS-CoV-1 and SARS-CoV-2 are both strains of SARSr-CoV (Severe accute respiratory syndrome related coronavirus), which in itself is a species, an abstract concept used to group related organisms into a convenient umbrella term.

There is no "eukaryote" organism the same way there is no "SARSr-CoV" organism. The added "r" was a recent addition when COVID-19 was discovered.

I will cede that I didn't specify this last point, and you were correct to point it out.

egberts1 · on May 17, 2020

Wow. A civil exchange on proper use of terms by two disparate but related technological domain experts. It’ll go on forever.

tacon · on May 16, 2020

>we would say SARS-CoV-1 and SARS-CoV-2 are both strains of SARSr-CoV

Thank you for making my point, again.

COVIDisntCORONA · on May 16, 2020

I don't believe I did.

Genera -- as in SARS-CoV-2's genus is Betacoronovirus -- don't have "strains."

Only families -- such as the SARSr-CoV family -- have strains.

nulbyte · on May 17, 2020

To GP's point, your initial reply said:

> SARS-CoV-2: strain of SARS-CoV

GP was pointing out that this was incorrect, and you just made that point by stating it yourself.

Assuming you are intending to engage in the conversation and not be a pedant, I might let you know that your replies are coming across quite coarsely. More specifically, as to prefaces on earlier comments, there is no need to excuse incivility, because there is no need for incivility here.

aSplash0fDerp · on May 17, 2020

I found the exchange to be more than civil, with pleasantries not being taken in the literal sense.

At least this did not fall into the category of "Cold regurgitation of data" (quite popular it seems) and had a level of warmth that was an indication of passion, more than anger (from all parties).

If they added a temperature social cue to HN comments..... That would be funny.

ArnoVW · on May 16, 2020

I'll add that the family of viruses is called Corona virus because they are 'crown shaped'

amelius · on May 16, 2020

Kind of strange that the name of the virus contains the symptoms of the disease (SARS = severe accute respiratory syndrome) while the disease doesn't.

rczhang · on May 16, 2020

You can just search for 'Coronaviridae' instead. The overwhelming majority of people searching for coronavirus want pandemic news.

pwdisswordfish2 · on May 16, 2020

"Is there a search engine which excludes the world's biggest websites?"

There was "rebranded" web search that someone created a number of years ago and posted on HN that aimed to exclude the top websites from results. I cannot remember the name he gave to the project.

One way to exclude the world's biggest websites when using Google is to restrict the search to TLDs other than .com, .net and .org. The root zone is full of silly new TLDs that no one uses for large websites. There are hundreds to choose from.

https://www.google.com/search?q=coronavirus+site:edu

Looks like Google Scholar is including a number of "coronavirus links" on the main page but thankfully not in the results.

https://scholar.google.com

https://scholar.google.com/search?q=coronavirus

Why not skip Google and "web search" and use a database that does not include all the crap one finds on the www

Something like

https://pubmed.ncbi.nlm.nih.gov?term=coronavirus

or

https://search.crossref.org/?q=coronavirus

ikeyany · on May 17, 2020

In that case, I would omit webpages from the past year.

drran · on May 16, 2020

Try http://biomed-sanity.com/ .

nojito · on May 17, 2020

Why would Google showcase non official sources?

40four · on May 16, 2020

It is kind of funny that we talk about SARS-COV-2 as if it is the only coronavirus. Coronavirus, singular. If I’m not mistaken the common cold is in the corona virus family.

computerfriend · on May 16, 2020

The common cold is caused by coronaviruses and rhinoviruses. The skew is heavily towards rhinoviruses.

40four · on May 16, 2020

Rhinovirus, that’s right. Thanks. I saw an article recently about colds caused by coronaviruses so it was fresh in my head.

chris_f · on May 16, 2020

If you are looking for a forum search, check out https://boardreader.com/.

I have a theory that web crawling alone is not the best way forward to find the most relevant results because of the volume of content continually being created, much of which is niche and sometimes dynamic.

Instead I believe linking together vertical search sources that have targeted information based on search intent will provide better results.

I created Runnaroo [0] for that purpose. If you search a programing question, it will pull traditional organic results from Google, but it will also directly query Stack Overflow for a deeper search.

[0] https://www.runnaroo.com

scottlocklin · on May 17, 2020

This is really good!

https://www.runnaroo.com/search?term=J+apply+verb

versus google's completely useless:

https://www.google.com/search?safe=active&q=J+apply+verb&oq=...

Everyone else, get in here: this is top notch stuff.

lazyjones · on May 16, 2020

This is somewhat ironic because 20 years ago, hobbyists would frequently put their obscure personal pages on Geocities and other large corporation's web space.

wolco · on May 16, 2020

Memory can be foggy but the most useful were hosted on university pages or random folders off a random domains or you get a subdomain. I picked the username 'search' which gave me search.batcave.net which worked great until one day they just took over the subdomain for a site wide search. They were confused when I complained.

Sure people hosted on geocities and tripod and they were the biggest and easiest to remember. But quality of a geocities page compared to a mit student page was much lower.

CalRobert · on May 16, 2020

20 years ago, it was extremely common for your ISP to give you 5MB or whatever of space to use. users.ispname.com/~yourname or whatever. It was great, tbh, since anyone with cuteftp and notepad could publish to the world.

ObsoleteNerd · on May 16, 2020

Still common now, at least in Australia. My ISP gives me 1GB if hosting for a website and 10 mailboxes.

fedede · on May 17, 2020

Hey! I actually liked this idea and I'm considering starting a learning project on it. I've seen a lot of interest and ideas in the comments, and decided to create a very short Google form to start gathering all the interested people so we can organise something interesing. Is anybody in? :)

https://forms.gle/5KuTYVdYaMzRD2n78

ryandrake · on May 17, 2020

If there was a way to simply exclude from searches: shopping, news, images, videos, and “listicles”, I think that would get us most of the way.

Especially shopping. The endless stores are the worst part of search results. If I search for anything that remotely looks like a product, the results are just choked with store after store trying to sell me the thing. Awful.

type0 · on May 17, 2020

If you do your own search engine like that, implementing rules that block affiliated commerce blogs from appearing would help alot.

sanqui · on May 16, 2020

There is a search engine with this exact goal: https://millionshort.com/.

I haven't had that great results with it myself though.

smackay · on May 16, 2020

I tried it with "waders" which are either the things that you to put on your feet to go fishing or a category of birds (shorebirds or herons). The results after going for all the options were still exclusively stores wanting to sell me the former.

Garbage in, garbage out. I guess. Still I like the idea of something to side-step the SEO perhaps with more effort they can make it work but relying on Google or any major search engine for the base results is the wrong way to go.

glenstein · on May 16, 2020

I tested your search term and had similar experience. I have, however, had positive experience with other categories, such as philosophy. Searching Wittgenstein with the top million sites removed, I found some gems: a play, a disney character that was a supercomputer named after Wittgenstein on a direct to video movie which I learned was later partly inspiration for Wall-E, Wittgenstein-oriented societies, awards, and general philosophy references I had never heard of.

I suppose it depends on the category.

spaceman_2020 · on May 16, 2020

I've found that you get far better results for many queries on YouTube now since it's tougher to mass spam YouTube with low quality content (compared to churning out a $5 article).

Nevada-Smith · on May 16, 2020

Google Scholar "waders" [1]

[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C29&q=wad...

behnamoh · on May 16, 2020

Man it's super fast! I wish Google were like this.

webspinner · on May 19, 2020

Oh, I've had epic results with Millionshort!! I mean, as far as stripping most everything away, and just surfing.

erikbye · on May 16, 2020

For google you can use this https://addons.mozilla.org/en-US/firefox/addon/g-search-filt..., just drop in your list of those 500 URLs, once you've decided on what the top 500 is.

For other engines you can use https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/ with this script https://greasyfork.org/en/scripts/1682-google-hit-hider-by-d...

atrudeau · on May 17, 2020

For Chrome: https://chrome.google.com/webstore/detail/google-search-filt...

terrycody · on May 17, 2020

really nice, thx for share!

thekyle · on May 16, 2020

There is Million Short which allows you to search without the top 100, 1k, 10k, 100k, or 1m sites. Personally what I'd like to see is a search engine that only indexes webpages without ads since that should eliminate lots of the SEOd garbage. It would also be nice to use the text to code ratio to derank JS heavy sites.

https://millionshort.com/

b0ner_t0ner · on May 17, 2020

> like to see is a search engine that only indexes webpages without ads since that should eliminate lots of the SEOd garbage

Not just ads, but also ranked by the number of third-party cookies/tracking scripts a site has.

joshspankit · on May 17, 2020

And blacklist anything with those outbrain blocks. “One weird trick to _____.”

Very surprised where I see those these days, and they always make me run away.

dorkwood · on May 17, 2020

The fact that news websites happily adopted those is disgusting. They're literally tricking people into thinking it's news content.

icedistilled · on May 17, 2020

And they also, for the longest time, strongly turned me off paying for any news sites. The more they use scummy ads that turn people off, the more they need ad revenue.

That's a nice negative feedback loop or catch-22

tlarkworthy · on May 16, 2020

I made a script on ObservableHQ to surf YouTube psuedo-randomly https://observablehq.com/@tomlarkworthy/random-place-on-yout...

I do a random city + documentary as the search term, it's taken me all over the world and seen some very strange things.

One of my favourites was Aarhus, which had a Danish language rapper proclaiming he was putting Aarhus on the global map (I have never heard of the city of Aarhus). https://youtu.be/WSZxuzgImLo They dis Copenhagen a lot too, lol. You get a more intimate YouTube experience with the low view videos

But I also seen amazing religious rituals. An excellent documentary on Karachi.

Because it's observable hq you can fork it and figure out your own algorithm for biasing the random.

totemandtoken · on May 16, 2020

Reminds me of this classic pg essay: http://www.paulgraham.com/ambitious.html

Specifically this quote: "The way to win here is to build the search engine all the hackers use. A search engine whose users consisted of the top 10,000 hackers and no one else would be in a very powerful position despite its small size, just as Google was when it was that search engine."

There has been a lot of grumblings about the state of search these days. Maybe the time is nigh for a new search engine?

tetris11 · on May 16, 2020

I feel that we should go down the adblock hosts list approach, where people download website lists from individuals they trust who have curated or scraped links of websites complete with keywords, and its up to the user to refresh their lists and then perform a search on their website.txt file

It will be limited, but still quite powerful, similar to the way that we can pick and choose different host file sources from the web.

rthomas6 · on May 17, 2020

Does anyone else remember StumbleUpon? It's not exactly the same as a search engine, but that worked really well for finding interesting content back in the day.

ublaze · on May 17, 2020

I've wondered if this can be built with a limited budget, in terms of cloud costs. At least it feels like you'd need a lot of hosts to satisfactorily index the web, and store the results so that you can return results instantaneously.

joshspankit · on May 17, 2020

With limited budget and the right results, I think people would pay with “non-instant” results. Even 5-10 seconds might be perfectly fine, and that’s “easy” on a limited budget.

_xnmw · on May 16, 2020

DEVONagent is a highly configurable search utility which can be used to combine and de-duplicate results from multiple search engines at once, exclude sites or keywords from a blacklist, follow deep links within search pages, and perform some filtering logic on the text of results.

Before I knew about DEVONagent I would often just search multiple engines and sources trying to find something particular (e.g. a particular PDF) or unique results.

https://www.devontechnologies.com/apps/devonagent

lemonberry · on May 16, 2020

Thank you for the link. This looks really cool. I used DEVONthink years ago. It seemed like a great piece of software but I didn't have a great use case for it. Looking forward to checking out DEVONagent.

petra · on May 16, 2020

It looks really interesting. Sadly, it's Mac only.

Does anybody knows of something similar for Windows or Linux ?

brightball · on May 16, 2020

This does look really interesting. Thank you.

pavelmark · on May 16, 2020

Simply removing Pinterest would be a huge step in the right direction.

joshuaissac · on May 16, 2020

I use an add-on called Unpinterested! to remove Pinterest results from my Google search results:

https://github.com/sellomkantjwa/unpinterested

All it does is add -site:pinterest.com to the search bar for image results (can be configured to also do it for Web results), but it gets the job done.

pier25 · on May 16, 2020

And quora

stock_toaster · on May 16, 2020

ye gods, yes.

user9429450 · on May 16, 2020

Any insight with regard to how Pinterest was able to do what they've done? Did they simply dump more money into it?

chaos_a · on May 16, 2020

https://wiby.me/ exists to solve this exact problem. I've found some pretty neat/odd websites on it in the past.

lucb1e · on May 16, 2020

The "About" information is, counter-intuitively, under "Settings":

> In the early days of the web, pages were made primarily by hobbyists, academics, and computer savvy people about subjects they were interested in. Later on, the web became saturated with commercial pages that overcrowded everything else. All the personalized websites are hidden among a pile of commercial pages. [...] The Wiby search engine is building a web of pages as it was in the earlier days of the internet.

btrettel · on May 16, 2020

That makes me think: Is there a search engine which removes pages with any ads or affiliate links? That might be the easiest way to remove the commercial pages.

lucb1e · on May 16, 2020

I was just thinking pages with external dependencies, in the spirit of the old web, but your idea sounds a lot more reasonable. Not sure if that exists, I'd be interested!

generalpass · on May 16, 2020

It is a carefully curated directory, which is problematic.

For example, I submitted Pizza Hut's archived original web page [1], but it wasn't added.

Even for a search engine exposing niches, updating a directory manually will likely be too slow, unless the directory is maintaining a single nich (e.g., unladen airspeed of every species of swallow), but then we end up with some insane number of search engines and how to select which one?

[1] http://www.pizzahut.com/assets/pizzanet/home.html

kd5bjo · on May 16, 2020

Especially if you’re focussing on evergreen information, there’s no reason why people can’t have their own personalized crawler and index— I’ve occasionally thought about rolling my own with a browser extension that lets me add seeds at the click of a button.

zxexz · on May 16, 2020

I've been working on something like this for my own use - I'm not a fan of browser-based history. My home-rolled solution is starting to be good enough where I can use it to easily find exactly what I'm looking for, assuming I've previously read it, by both searching the title and URL, as well as the content on that page (my major gripe with "History" in Chrome and Firefox is that it doesn't search the page content, and if it did, syncing it would have major privacy concerns).

The problem I'm running into is that I still have to use major search engines to find new content, way more than I'd like. I hope to make my local service available open source once I have 'federated' history search working, so that we can have a primitive search engine and share with people we trust. Also need to work out some security issues - it's scary having all the content you read and see on your home network, protected only by your hackily-patched-together security.

EDIT: Actually I'd like to elaborate a bit more in case anybody actually reads this and has any ideas. On the desktop side, it's pretty easy. Initially started out MITMing my own traffic with a self-signed cert added as a root cert to all my machines. This only works on my home network, so I did a VPN thing. This was way to clunky and the security concerns are innumerable. I ended up biting the bullet and writing a chrome extension which works wonderfully, except for some slight performance issues.

However, I wish to also archive my phone content - I read just as much on my phone as my computer. I can do it on Android with the MITM process, but the same issues as above still apply, and it doesn't work with iOS (at least I can't find a way).

I'm thinking of taking an open source project, like Firefox/Fennec and building it in to the app itself. In that case it may make sense to forgo the browser extension and just roll my own forked browser on every platform, even iOS. I don't know much about iOS dev though.

ehonda · on May 16, 2020

I clicked your link, but I don't see an archive, its redirecting me to their main website.

Wiby is based around two main things:

Non commercial content (1) that does not rely heavily on excessive javascript and CSS (2).

http://wiby.me/submit contains the submission criteria.

bsanr2 · on May 16, 2020

I tried this yesterday. It seems biased towards the interests of the curators, and, like Google in regards to "some ideal average consumer", is therefore useless if you fall outside a certain level of similarity to the target demo.

For example, I enjoy weightlifting and strength sports. I did a search for "muscle", and every result but one was using the word "muscle" as a figurative metaphor. Barely anything about actual muscles. Searching "funk" was just as bad. One page about Motown and a LOT of midis.

keenmaster · on May 16, 2020

What if Google Advanced Search produced a visual network map which showed you the salient clusters of terms related to your search? You’d then be able to click on a cluster and the search results would change to adjust to what you’re really searching for.

Ex: The network map for “weightlifting” would include many clusters, but 2 big ones would be the hypertrophic cluster (surrounded by a bunch of related terms) and toning cluster (calisthenics would be under this cluster for example). Click on either and the results will change accordingly.

This would actually work even better for subjects you don’t know much about, because Google will teach you about the salient clusters in that field. The clusters could be enhanced with popular images associated with each term. Popular clusters would display as larger than others.

miek · on May 17, 2020

I like this idea a lot. Wonder if anyone is working on this.

keenmaster · on May 17, 2020

No one that I’m aware of. I wouldn’t mind if Google hired me to help make it a reality. I have an email address in my bio (lol).

ehonda · on May 16, 2020

Hi, just wanted to clear something up. Wiby is biased towards the interests of those who submit websites to it.

EmilioMartinez · on May 17, 2020

Just added wiby.me/surprise to the bookmarks bar. Amusingly, the icon keeps changing every time I use it.

Aeolun · on May 16, 2020

But to add my site I have to add every page individually. Nobody ain’t gonna use thst.

I’d have to submit every blog post?

ngold · on May 17, 2020

Can't wait to try this search engine. Thank's for the link.

mikekchar · on May 17, 2020

Here is an idea that I've always wanted to do, but will never have time for: A curated search engine.

Basically the idea is to have people band together and "recommend" links. You then do your normal spidering of the websites to create a search engine (or even just call through to a number of existing search engines). However, the ranking of the results is based on the weighting of the recommendations.

It's essentially a white list based on your own personal bubble. Of course this won't work in general because you will always get SEO creeps spamming recommendations. However, it gives you tools for working around those creeps. The average person probably won't be able to manage it, but power users probably will.

By not trying to solve the problem for everybody, it makes it easier to solve to problem for some people. Or at least that's my thesis :-) I might be wrong.

netsectoday · on May 17, 2020

You can boot up your own custom search engine in a few minutes with YaCy (Ya See!) an open-source, P2P, Dockerized crawler and search engine built on top of Solr.

https://yacy.net/

If you're generous; you can make your index available to other P2P instances.

I wanted to run an API search the other week and was blown away with how quickly I could prop-up my own custom search portal (I didn't want to pay for API access to other search engines, and YaCy comes with a JSON and Solr endpoints).

I ran it locally to test my crawl filters, then pushed a private instance out to Digital Ocean to turn up the heat with the crawling. The only issue I had was the crawler would hit the max memory threshold on long crawls and the container would restart, but that was fixed by scaling up the box.

l72 · on May 18, 2020

I have my own yacy search engines running internally (non-peered) for similar reasons. One crawls some key code documentation sites that I need for work, and another crawls a whole bunch of music blogs.

While I typically still use RSS for reading music blogs, I find having the search engine is a great way to go back and find something or discover something new! Every time I find a new blog, I just add it as an index to yacy to crawl.

I think it'd be great to see people spinning up larger instances that are highly specialized. For example, maybe a search engine that is dedicated solely to sci-fi and only crawls high quality boards, personal sites and blogs, and skips all the spammy, seo-optimized sites.

crawlcrawler · on May 16, 2020

I built a search engine for this and other, similar purposes. With Crawl Crawler you start out by searching the meta data of a Common Crawl ("CC") crawl. Then you define a sub section of that data collection by designing a query which search result includes your favorite sites. Then you enrich that sub section by linking those meta data documents (that come from CC's WAT repo) to full text extracts or HTML from CC's WET repo or the WWW. Then you set it to recurringly refresh that section. Voila! You have created a search index that includes your preferred sites. https://crawlcrawler.com

chris_f · on May 16, 2020

This is pretty cool. I always wondered why there wasn't a user interface search somewhere for the CommonCrawl data.

allwynpfr · on May 16, 2020

You should try million short. As the name suggests, it takes our the first 100 / 1k / or a million results so you're left with those that aren't all that popular. That seems to be what you're looking for. https://millionshort.com/

nic-waller · on May 16, 2020

My hobby project is https://random.surf (works better on desktop than mobile).

I share that same desire to visit the web less travelled. I want to discover interesting sites that deserve to be bookmarked because they will never show up in a search engine.

77ko · on May 17, 2020

Love it! Discovered something interesting very quickly. Bookmarked for future use.

webspinner · on May 19, 2020

That's definitely spinning the web, which is a whole lot of fun.

severine · on May 17, 2020

Thank you!

True "Interdimensional Cable" vibes.

tetris11 · on May 16, 2020

This is excellent

dangoljames · on May 16, 2020

There used to be java applet embedded in altavista.com's website that could be run against search results. It would do semantic processing on the results and present a list of generated terms, each with a checkbox. Checking a box would pull any returns which contained the topic from the remaining search results.

This was fire. If a topic were being discussed on the web, you could find it with this tool. Unfortunately, it did not fit the vision of the parasitic overlords who bred us to produce and consume for their benefit.

visarga · on May 16, 2020

Altavista itself was a junk search engine though, especially after they sold out and the new owner stuffed it with ads.

dennisy · on May 16, 2020

I think you could get good results if you just penalise sites for the number of third party JS. Which shows by proxy a more established site/corp.

You could add a bunch of heuristics such as size, number of links etc.

Maybe even train a classifier to select the “smaller” part of the web.

inopinatus · on May 16, 2020

I would pay real subscription money for a search engine that focused on knowledge-oriented results rather than retail and commercial results.

When I type “shoes”, it would give me: links for the functional and creative history of footwear, the taxonomy of shoes, methods of construction, current and historical footwear industry data, synonyms and antonyms, related terms and professions, the dictionary definition, and similar links related to secondary meanings (such as any protective covering at the base of an object, horseshoes etc). I’d also hope for a comedy link to a biography of Cordwainer Smith.

What I actually get, which I don’t want at all: pages and pages of shoe shopping.

The various means to exclude “top X sites” are the roughest possible heuristic in that direction, and throw out the baby with the bathwater (for example, a long-established manufacturer may well have an informational online exhibit)

Google has essentially failed me in its primary mission. Bing at least has the grace to admit they are here to “connect you to brands”. And sadly, right now, every other option is an also-ran.

In practice I use DDG, directed by !bangs towards known encyclopaedic or domain-specific sources. I am certain that I’m missing out.

seektable · on May 17, 2020

What you describe sounds like a mix of personal knowledge base that seats on the top of existing search engines, public and maybe even private databases. Major difference is here:

* when you make a query to this knowledge base, it has a history of your prev searches / preferences (not google)

* it can propose variants of suggestions on what is your intent in this particular query - and make much more detailed queries (auto include/exclude keywords websites etc) to multiple sources (not only google, maybe anonymously)

* it can parse results from these sources and re-arrange them (use own rank system) according to the your preferences. In this system, you can explicitly say - I hate that, and I like that, and this will affect the behavior. Yes this is 'information bubble' but it is controlled by you and not by google!

* finally, this system may work in background and handle 'research' search queries. What I mean here: currently, Google is about instant search - it gives you results in milliseconds, and that's all. It cannot spend much computations for more precise, more intellectual check of content in links from the search results - it cannot do reasoning - and you have to do that by yourself: open links from 1-st page, and close most of them immediately b/c they are not relevant for you, go to 2-nd page and so on. It would be cool if most of this could be automated - with modern natural language processing approaches and old-school prolog-like reasoning this is real and not a fantastic from sci-fi.

My vision that this kind of search assistant cannot be SaaS / closed source. It is about the freedom - and thus this should be open source / self-hosted app that can be deployed on PC or on cloud VM - but hosting should be controlled by end-users, not companies.

I don't know if something like this ever exist. If not, maybe its time to create it.

atlantique · on May 16, 2020

That sounds like the job of an encyclopedia. Maybe some sort of collaborative encyclopedia where people can edit pages and add references.

inopinatus · on May 16, 2020

I know you’re only half joking. In practice many, perhaps most of my DDG queries do end in !w - but there’s a wealth of information that is relevant, interesting, and useful, but wouldn’t be considered encyclopaedic, or that is merely summarised in Wikipedia; in addition, their references are included as supporting citations, very far from a comprehensive index of currently accessible information.

text_exch · on May 17, 2020

I've long wanted to build a search engine of only personal blogs. I am less familiar with the field of information retrieval so I haven't gotten started yet, but it's always been a dream of mine and if anyone is interested please contact me at threemillionthflower [at] the world's largest email provider.

Discovering unknown parts and blogs on the internet is one of the enduring goals of a newsletter that I run [1], which provides a single link to an interesting article every day, usually by lesser-known authors and blogs across the internet.

[1] www.thinking-about-things.com

webspinner · on May 19, 2020

I'm now a subscriber.

011-video · on May 17, 2020

You are your best search engine !

On a daily basis your brain use shortcut to get to the point. Open Firefox (of course) ALT+B. Then add a new bookmark for instance :

Name : Stack Overflow

Location : https://stackoverflow.com/search?q=%s

Tags :

Keyword : st

Now if you want to search "javascript timer", just type : st javascript timer

Add "%s" to all your favorites website search url.

Example : https://en.wikipedia.org/wiki/%s

To discover some new website content, apply the same trick to Hacker news, Reddit or any RSS River.

Voila, bye bye GG.

NateEag · on May 16, 2020

For Google, you can ignore specific sites by adding "-example.com".

See this example of filtering Stack Overflow out of search results:

https://www.google.com/search?q=loop+over+array+items+in+jav...

1f60c · on May 16, 2020

More specifically, -site:example.com, although there have been reports of Google breaking this time-honored functionality.

brentis · on May 17, 2020

Imagine if sort results had table filters and sort.

Popularity, Relevance, Age, Type, etc. type could be blog, forum, site, or video. Or like it used to be.

busymom0 · on May 17, 2020

I have been finding recently that Google has been breaking their existing "sort", "time" filters. Try searching for something with site:reddit.com prefix for example and set the time filter to be lets say "Last year". Google still shows you results from 4-5 years ago.

BostonFern · on May 17, 2020

I also discovered that recently. It's gone the way of the verbatim constraint.

Control is being forfeited to steer users back to more profitable content in order to capitalize on a captive market.

I wonder if being open about it would be so bad for business, instead of the attempt to manipulate users into enjoying the ratcheting-up of their impotence.

Now, Youtube truncates search results and loads the recommendation stream instead, long before hits are exhausted.

At least it's been a while since Silicon Valley was keeping the mythical personalized advertising spiel in active circulation.

sneeuwpopsneeuw · on May 16, 2020

I personally use Google Chrome with the duckduckgo search engine. Duckduckgo is not perfect, very in depth searches (such as gameboy advance memory layout only return junk, while google knows you are searching for a nich) but on your average search it is as good as google, somethimes better because it is more factual and will promote less webstores. When it does not give me what im looking for I can add !g anywhere in the question and the same search is done using google.

Then I use Violentmonkey an open source js/css injector to inject this user script: https://greasyfork.org/nl/scripts/1682-google-hit-hider-by-d... This will block specific domains for you in google, yahoo, duckduckgo etc. I use this to block domains like Quora, sourceforge, cnet and softonic.

The nice thing about this script is that you can permaban domain you know are junk and they will completely be removed or you can ban a domain like commercial websites. When you ban something it is not removed from google or duckduckgo but it only shows the title in light gray, Im currently experimenting with this on some mayor webstores so I can not really say if this may help you but It can be a good start.

(edit) I saw some people say why this was not possible before. Google allowed you to block domains and website a few years ago, but they removed this feature. Duckduckgo never allowed you to do that because that would mean that you will have a cookie that remembers your preferences and that is against there principles.

1f60c · on May 16, 2020

> I can add !g anywhere in the question

I knew about !bangs, but I didn't know you could put them anywhere in the query (e.g. "hello !g world" searches Google for "hello world"). This is going to save me a lot of time on mobile. Thanks!

wumpus · on May 16, 2020

If the question is "Is there a commercially-viable search engine that supports this feature", then the answer is "probably not".

Implementing this properly involves having your own search index. And that's pretty expensive.

bamboozled · on May 16, 2020

I think on DDG you can do !mil which excludes the first million top ranking sites.

Edit: Maybe it’s the first million results? I use it to find obscure things sometimes.

mmsimanga · on May 16, 2020

When researching a topic I have had great success searching HN and reading through the comments. If I want to find alternative software tools for a tool I am using the comments on HN are best. Searching through subreddits also yields better results than Google.

DavidPiper · on May 17, 2020

It feels like there could be a (partial) meta solution here:

A search engine that returns results whose pages weigh in under a certain size.

From the comments it seems most of the "cruft" filling up Google results are newer web apps, generally JS-heavy and advertising-heavy, etc.

If you had a filter for pages with (e.g.) < ABC kb of JS, < XYZ external links (excluding img tags), I feel like there'd be a good chance that the "old" web and the "unknown" web would bubble to the top.

There are plenty of false positives (particularly for "small" forums build with modern JS apps, etc), but it could be one of many filtering tools to achieve better search results.

ngold · on May 17, 2020

Great idea. Google seems to do nothing but remove search options. At least they still have a time filter. Ddg only does a year old I believe.

turnipla · on May 16, 2020

Google used to let you blacklist websites many moons ago, that would go a long way already.

Now there are a few extensions that do that, but obviously they only hide the results from each page, so sometimes you will see pages with 2 results, if any at all.

rozab · on May 16, 2020

Would be easy to just inject a negative site clause into the query, e.g. `-site:fandom.com`

chrisfrantz · on May 16, 2020

Looks like google limits search query length to 2048 characters. That’s probably enough room to exclude a majority of the biggest names.

dublin · on May 16, 2020

It would be nice if there were a way to make the exclusion list de the default for all your queries. For instance, I never want to see results from WikiHow again. Ever. Or the New York Times or any of the other paywalled sites...

kortex · on May 16, 2020

unpinterested is an extension which simply adds -site:pinterest to image searches. I don't think it'd be hard to do something similar with a custom list.

methou · on May 16, 2020

I used a Google Custom Search Engine (CSE) to remove results from Softonic and alikes, it works well, but still very Google.

petra · on May 16, 2020

Google CSE is a great idea.Tried it in the past.

But i find the search is at a much lower quality than Google.

dexen · on May 17, 2020

There is a similar problem where Youtube's recommendations and auto-play are mostly big name brands, to the exclusion of individual reporters, commentators, and other content producers. Since recently, a "De-Mainstream Youtube" plugin[1] is available for Firefox and Chrome, fixing that to some extent.

--

[1] https://demainstream.com/

bmd3991 · on May 16, 2020

What I’d like to see is a search that excludes any page with ads, and any page with affiliate links. That alone would get rid of 90% of the garbage

chongli · on May 17, 2020

I've expressed this wish on HN a few times in the past as well. One downside to it that I can think of (off the top of my head) is that there may be a lot of small, personal websites that are hosted on a free blogging platform which injects ads into every page. These websites may be very valuable (and not SEO garbage link farms) but would get blocked nevertheless.

dddddaviddddd · on May 16, 2020

Sort of a server-side, page-level adblocker.

bmd3991 · on May 17, 2020

It could make for an interesting Firefox extension I think

peel40 · on May 16, 2020

I think there's a simple google way. Just add `-bigwebsite.com` to your query.

[search term] -google -youtube -facebook ... -top100website and it should work.

I found a list of the top 1m alexa websites here:

http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

An add-on with that list should do the work.

abraae · on May 16, 2020

That's pretty clunky.

- there's probably a pretty low limit for size of Google queries, you'll likely hit it quickly

- you won't be able to search for e.g a story about YouTube censoring some content

peel40 · on May 16, 2020

I don't know about the current query size limit but I think it's pretty likely to get hit quickly as you correctly pointed out. But, it's useful to use the wildcard "-site" ex: "-site:bigwebsite.com" for excluding just the site, and not the very word being mentioned.

ex:

facebook censorship -site:facebook.com

https://www.google.com/search?q=facebook+censorship+-site%3A...

Mandatum · on May 18, 2020

Filtered out regional sites that Google wouldn't cater my search for (eg Taobao, Baidu, etc).. Only possible with 31 negative sites.

-site:google.com -site:youtube.com -site:facebook.com -site:jd.com -site:yahoo.com -site:wikipedia.org -site:amazon.com -site:netflix.com -site:reddit.com -site:live.com -site:zoom.us -site:okezone.com -site:alipay.com -site:instagram.com -site:twitch.tv -site:csdn.net -site:blogspot.com -site:microsoft.com -site:bing.com -site:github.com -site:tribunnews.com -site:myshopify.com -site:office.com -site:panda.tv -site:stackoverflow.com -site:ebay.com -site:bongacams.com -site:livejasmin.com -site:babytree.com -site:naver.com -site:apple.com <search query>

coronadisaster · on May 16, 2020

your query would obviously be too long for Google

peel40 · on May 18, 2020

yeah, that's sad. Google limits its queries to just 32 words.

bhartzer · on May 16, 2020

There is a custom search engine called Newgle.xyz that only shows results from the 1000 or so new gTLDs (new top level domains).

It’s custom google search results, but since it’s excluding .com, .net, .org etc then you probably won’t see any of the large sites there.

It’s also interesting to see which sites have been built in the last few years, as the new gTLDS haven’t been around that long.

rkagerer · on May 16, 2020

I would like one that punishes sites with too much ad to content ratio.

loosetypes · on May 16, 2020

What are folks’ non-commoditized heuristics for finding new things online?

I was intrigued by how dorkweed’s approach has changed over time, as described in a reply to a sibling comment.

As general search results get watered down and rotten tomato inflation maybe trends towards reflecting company interests rather than my interest-level, maybe it’s worth re-evaluating the vetting avenues we take as users.

Here’s mine: for games and shows I’ve recently found myself using quantity of fan-videos on YouTube as a proxy for quality. So far it’s been a decent means to find cult followings for something I otherwise wouldn’t necessarily hear about.

Obviously this approach has its flaws - and is subject to financial perversions to an extent - but I figure if enough people genuinely want to pay tribute to a work, it might be worth checking out.

bluishgreen · on May 16, 2020

How'd you find the quantity of fan videos?

Personal trick: I follow reaction video blogs, and if they are reacting to something then it is usually worth watching. But reaction blogs are only for short videos and other short form content.

ChrisMarshallNY · on May 16, 2020

Remember sites like stumbleupon?

I find that the YouTube sidebar is useful for me to find interesting music. I have eclectic tastes, and Google seems to have figured that out. I don't mind.

I suspect that it would be possible to create a custom API query to Google that would have a "blacklist."

smsm42 · on May 17, 2020

There's Million Short: https://millionshort.com/

I think they try to do exactly what you ask, but I haven't used them extensively so don't know how good are they.

abarrettwilsdon · on May 18, 2020

For more queries, you can add modifiers to a Google Search to get the results you want

Seeing folks mention the NOT operator (-). It's quite powerful! For example, you can do:

intext:"Powered by intercom" -site:intercom.com will find all the sites that use the Intercom widget

or ~blog bread baking -inurl:checkout -intext:checkout will find bread blogs (or similar) without commercial intent

I put together a list of the two dozen or so most useful templates of this, for folks who are interested: https://www.alec.fyi/dorking-how-to-find-anything-on-the-int...

dhbradshaw · on May 16, 2020

I've wondered too about something similar to that. Basically, I'd like sessions for searching.

Each session would have an updatable list of sites that are favored, whitelisted or blacklisted for a particular class of search.

maayank · on May 16, 2020

I'm intrigued by actual use-cases for it except exploring, i.e. where it would give better result for a query than the common search engines.

Anyone reading this, please post if you find any

crocodiletears · on May 16, 2020

Big ones:

1. Looking for niche domain or institutional/social knowledge produced by experts or insiders for an informed audience that isn't necessarily available in a scientific journal.

Especially with respect to the social sciences and literary analysis, there's a wealth of intelligent commentators that don't surface well on Google without very specific search terms, and the willful subtraction of domains like quora, medium, and tumblr.

They're usually contained on poorly maintained WordPress sites that the author has long-since forgotten about, or as invalid, handcoded html docs hidden in the personal subdomains of university professors and students.

2. Finding online communities that aren't a part of Reddit or a similarly prominent platform

chasd00 · on May 16, 2020

in the same vein, it would be awesome to search for a product to buy with the results being ecomm websites owned by people in my area. A way to "shop local" online.

technotarek · on May 16, 2020

ATTIC: A visual search and discovery engine to help you find the latest products from small, unique businesses near you.

https://attic.city/

Currently for three product tiers (furniture, home decor, and fashion/clothing) in 14 major US markets, where stores within ~100 miles or a ~2 hour drive are considered as part of the market.

Disclaimer: I'm one of the founders.

derision · on May 16, 2020

How do you curate the stores?

technotarek · on May 16, 2020

Aside from the constraints we apply to market/geography and product type? If that's what you mean, then technically it's a matter of whether the store's ecom platform is compatible with our indexer, which supports ~20 different platforms (and hundreds of variations). Otherwise, we do some light curating for product quality to include, but not limited to, the accuracy of meta data (titling, description) and image quality.

fomine3 · on May 18, 2020

I'm curious how English speakers (especially not in US?) finding national shopping websites because I live in Japan so never find foreign website written in Japanese

alibaba_x · on May 16, 2020

Shopify released a mobile app recently for this exact purpose.

technotarek · on May 16, 2020

...but only for stores using Shopify as their ecom platform. That misses A LOT of what's out there, probably 75% or more.

amelius · on May 16, 2020

If only Google allowed us to omit websites from search results.

Google says they need our information to "improve our experience", but we can't tell them what to omit ...

rsoto · on May 16, 2020

They used to allow that, it was very useful. But as with almost everything Google does, they killed it.

fedede · on May 17, 2020

Hey! I actually liked this idea and I'm considering starting a learning project on it. I've seen a lot of interest and ideas in the comments, and decided to create a very short Google form to start gathering all the interested people so we can organise something interesing. Is anybody in? :)

https://forms.gle/5KuTYVdYaMzRD2n78

jsgo · on May 16, 2020

I don't know that I'd want a search engine to specifically exclude or limit the results of specific sites of their choosing (even if top 500 as the example is fairly unbiased), but I think I'd really like the ability to say "move these specific domains a few pages back. Don't eliminate them outright, but I have felt dumber having read them previously."

pengstrom · on May 16, 2020

What I want is to filter out commercial results. When I'm searching for a product I don't want shills, I want real opinions.

third_I · on May 16, 2020

And then came undisclosed sponsorship and that difference blurred more than ever...

21xhipster · on May 16, 2020

https://cyber.page

Its kinda new so it excludes kinda everything :-) But you can make it work better :-)

https://ipfs.io/ipfs/QmQ1Vong13MDNxixDyUdjniqqEj8sjuNEBYMyhQ...

Aeolun · on May 16, 2020

Can you explain to me what this it is (or is meant to be)? It doesn’t appear to have a search field, though the lightning effects are impressive.

freefriedrice · on May 16, 2020

Why exclude the biggest websites?

The problem I see on DDG & Google is having to scroll 5-10 pages of utter SEO nonsense.

"Do you have a question about ____? Many ask about _______. ____ is a common question, here the are we some answer. [sic]".

Just utter garbage pages.

It used to be just with recipes or medical questions, but now it feels like most everything that is a general query.

dredmorbius · on May 16, 2020

https://news.ycombinator.com/item?id=22792243

piusp · on May 16, 2020

I have used copernik in the past this was a collection of search engines, listing more than 140 search engines. It combined the search results and sorted them by the key word % matched. It also had a lot of tools inbuilt for validating he links, coping the selection/ sorting and sharing the results. Simply amazing results.

wyck · on May 16, 2020

Google search is so sad these days, all results are media conglomerates, it's completely counter to the core reason why the internet existed. I really hope by catering to these mega corps that they are completely undermining their brand and someone else comes along and pulls the rug out from underneath them.

If anyone noticed during the first couple days of covid, google search was free from large media results, the algorithm reverted back to how it was years ago and it was such a breath of fresh air. Of course they fixed the algo immediately, it went back to only showing curated media results..there was an anon google employee who posted why this occurred.

fermienrico · on May 16, 2020

I think we are gonna see aversion from going public. Companies like Stripe and SpaceX are gonna stay private for a long time.

When SEC laws, shareholder interest, quarterly performance and stock volatility comes into play, corporations become this mindless soulless monster that will devour everything in its way and fuck consumers in every which way.

Democratization of funds from central authority to public creates disincentives and the shareholders don’t give a shit about many auxiliary things such as environmental concerns. Bottom line always matters.

It’s not just google but any public corporation. Can you imagine SpaceX being able to operate with the same passion with shareholder interests?

texasbigdata · on May 16, 2020

It's not, I would argue, those things.

It is, potentially, the compensation plans. If you go to the proxy document and look at how comp plans are set, they usually hire a consultant, and "best practices" drivers are cash + big bonus based on typically some TSR (total shareholder return metric).

So for google, "don't be evil" is what's written down, but for the top execs "sell ads" is what gets they paid out before they retire. And those senior level "lifers" are what 40 now?

Don't really have proof to support these claims though.

Shared404 · on May 17, 2020

Not even sure "don't be evil" is still written down. [1]

[1] https://gizmodo.com/google-removes-nearly-all-mentions-of-do...

koheripbal · on May 16, 2020

I find the only way to get useful results from google is to use the "site:xyz.com" or "related:" options.

om42 · on May 16, 2020

Do you have a link to that thread where an anon google employee posted on why it occurred? Interested in what happened since I didn't realize it did.

wyck · on May 18, 2020

The reason given: Legit covid information was not getting through during the fist few days of the crisis, basically made an emergency move to disable the whitelist/blacklist functionality of the algo which reverted to how it worked years ago.

chrischen · on May 16, 2020

It’s a hard problem because the lower down the pyramid you go the more content spam there is.

kian · on May 16, 2020

Do you have a source for the claim in the last paragraph? Would be very interested to read it.

pkamb · on May 16, 2020

I'd love a search engine that mainly searched Stack Exchange, (old.)Reddit, and some subset of blogs or single-author websites.

Especially removing Quora, Pinterest, and aggregation/reposting/SEO/affiliate blogs.

And all "product" images with a white background. Only show real photographs.

dredmorbius · on May 16, 2020

Setting specific site restrictions (only one applies at a time, e.g., "site:example.com"), or utilising DDG bang search, gets close.

Cyclone_ · on May 16, 2020

Seems like a browser plugin might be a quick and dirty way of just filtering results to achieve the goal.

social_quotient · on May 16, 2020

A mainstream search engines kinda like a big marker equity ETF or index? There are a ton of benefits but as a negative they make price discovery difficult and give monetary allocation to companies that probably shouldn’t have it.

Just a thought experiment, curious what others think.

wmnwmn · on May 16, 2020

Maybe what we need is a return to the very beginning, namely human curated web catalogs, aka Yahoo

dluan · on May 17, 2020

I mainly use google as a reddit search engine these days. "tiki cocktail pineapple juice reddit" gives me way more than google algorithm, and plus it's kind of like human powered SEO where genuinely useful links will likely have some discussion.

rdtwo · on May 17, 2020

So I figured I’d try a few of these with “Seattle vegetable garden blog” as the keyword. Either there aren’t a lot of blogs on the topic or most search engines miss them because results are sparse and they really shouldn’t be.

ErikAugust · on May 16, 2020

A curated, searchable web directory might be a concept that could come to be these days. It would share some of its DNA with the old school web directory but also share some with a search engine.

tokyokawasemi · on May 18, 2020

I sometimes use "inurl:wordpress" when searching for travel info. This ensures more first-person blog accounts, rather than all the tripadvisor junk that's at the top.

known · on May 16, 2020

https://twitter.com/search?q=twitter&src=typed_query

moreWeed · on May 16, 2020

Man you read my mind, just starting thinking about this. From a search censorship perspective, the BBS's we were building in 93 would be better than what we have now.

Nevada-Smith · on May 16, 2020

Depending on what you're looking for, try Google Scholar [1]

[1] https://scholar.google.com/

blondin · on May 16, 2020

omg yes please.

can google allow us to exclude certain sites? i was surprised to see w3school showing up above official documentations for pandas and numpy. this is simply ridiculous!!

jotm · on May 16, 2020

the "-" operator still works. E.g. "weird stuff I found interesting -reddit.com -youtube.com -wired.com -w3schools.com"

badrabbit · on May 16, 2020

It wouldn't be hard to remove such results using a browser extension,but you will be scrolling a lot. Maybe duckduckgo should support it,feature request?

saadalem · on May 16, 2020

Ok here is an additional idea for fun :

A search engine that shows only urls that are not indexed b google / another one that gives you the websites with lower pagerank

bmwracer · on May 16, 2020

Not sure it would yield any useful results, but have been thinking about a no-index search engine for a while to help find obscure or otherwise hidden information. If one exists or you build one let me know.

webspinner · on May 19, 2020

Showing all the lower ranked sites would be pretty cool. A lot of blogs would probably end up there, especially if they don't care about pagerank like me.