SearX: Privacy-respecting metasearch engine

halflings · on Nov 13, 2017

Even with a pool of proxies, I would expect an instance of this "metasearch engine" to quickly get banned by the other search engines. The same IP running thousands of queries and scraping its content (which is against their ToS) should be easily detectable.

_b8r0 · on Nov 13, 2017

I've been running multiple SearX instances for goodness knows how long and this has never happened to me. I'm not aware of this happening either.

SearX uses multiple sources for queries, so you'd have to be banned by quite a few search engines to stop it being useful.

Also relevant is filtron[1], an application firewall built-in to SearX that rate limits searches.

[1] - https://asciimoo.github.io/searx/admin/filtron.html

snowpanda · on Nov 13, 2017

Building on that issue, I'd like to add that it would be nice to have a feature that alerts a user that certain a search engine is denying requests. It's visible in the logs or settings somewhere, but usually I find myself wondering for a while why my search queries aren't accurate before heading off to figure out why.

Still a great project though, I use it every day.

jbg_ · on Nov 13, 2017

At least for me, next to each result is a list of the engines that returned that result. I run searx through Tor, so I occasionally find that Google stops returning results for a few minutes.

It doesn't happen often, but it's easy to tell when it does because none of the first page results have "google" next to them, while of course normally most of them would.

jazoom · on Nov 13, 2017

I'm curious. How does DuckDuckGo do this?

hanbura · on Nov 13, 2017

By paying for API access to other search engines. Yahoo used to offer that publicly but eventually shut the service down (but kept DDG as a legacy customer)

jazoom · on Nov 13, 2017

That's interesting. I wonder the cost.

mtmail · on Nov 13, 2017

Approximately $1 per 1000 requests https://www.programmableweb.com/news/yahoos-new-search-api-p...

jazoom · on Nov 13, 2017

Thanks for the link!

VoidWhisperer · on Nov 13, 2017

This is self-hosted, so I'm assuming it's running under the assumption that each person hosts their own instance and uses that instance. The number of queries coming from the instance in that case wouldn't look too out of the ordinary.

lawl · on Nov 13, 2017

Then that defeats the purpose of trying to be privacy focused if your search queries aren't mixed with other people's queries.

jbg_ · on Nov 13, 2017

I also run searx self-hosted, configured to proxy all its queries through Tor. Occasionally one of the engines doesn't return results (probably due to blocking), which is barely noticeable since several others still work, but normally all the engines including Google return results.

Since searx doesn't store cookies returned by the search engines, and I'm using it through Tor, I think this is a significant improvement over sending all my search queries to Google directly from my laptop.

y4mi · on Nov 13, 2017

I can't grasp how you got that idea. Do you not know what self-hosted means?

the engine craws the web and saves its data locally. this locally saved data can be queried/searched. So yes, in your search engine, there will only be your own searches. But these searches are only visible to your own servers/services.

halflings · on Nov 22, 2017

> I can't grasp how you got that idea.

By reading the link?

> the engine craws the web and saves its data locally ...Saves an index of the whole web, locally?

This is not what SearX does. It queries other search engines.

y4mi · on Nov 22, 2017

yes, i mixed up the engines and commented without verifying which it was.

i'm sorry for that. i just thought it wasn't necessary to edit as somebody sufficiently pointed out how mistaken i was 9 days ago.

jbg_ · on Nov 13, 2017

This is not how searx works.

saas_co_de · on Nov 13, 2017

google will give you a captcha every once in a while but they never actually stop you from using their service.

userbinator · on Nov 13, 2017

It will also sometimes ban you completely (not even the CAPTCHA works, solving it just gets you another) for ~2h. I've triggered it manually, usually when trying very specific queries and multiple variations in quick succession and also going through to the "end" of the result pages.

dajohnson89 · on Nov 14, 2017

getting banned in that case must've been extremely aggravating.

amelius · on Nov 13, 2017

I wouldn't worry about it. If they get banned, they will probably apply some ML technique to circumvent any CAPTCHAs to get access again. Also, this can run from the user's computer so it would actually be quite hard to detect that the results are being aggregated, and stripped from ads.

snowpanda · on Nov 13, 2017

On a related note, have any of you tried FindX?

https://www.findx.com/

https://github.com/privacore/open-source-search-engine

Looks promising but haven't used it much yet.

O1111OOO · on Nov 13, 2017

I like this part "it draws its results from its own bot that crawls the web". There aren't too many that use their own bot.

Tsignal (https://deepsearch.tsignal.io/) is another that uses it's own bot with a little AI tossed into the mix. And... it's currently not accessible:(

Wanted to add that I'm currently on Opera and using an extension named Search All[0]. After conducting a search via your default search engine, this extension places a bar with a list of user configurable search engines. This allows you to search alternatives easily using the same keywords.

One great feature is that if you go directly to a search engine, click on the Search All icon, it almost always identifies it with correct parameters and can be easily added. Just added findX to my bar (for testing).

I plan on going back to Firefox and wish FF had something like this (part of the reason I'm posting this).

[0] https://addons.opera.com/en/extensions/details/search-all/?d...

fghtr · on Nov 13, 2017

There is also http://yacy.net, peer-to-peer distributed free search engine.

finnn · on Nov 13, 2017

I wonder what the rationale behind listing the site's CA in the public instance list (https://github.com/asciimoo/searx/wiki/Searx-instances).

bussie · on Nov 13, 2017

Does this have any advantages over StartPage?

_phaq · on Nov 13, 2017

* Can be self-hosted.

* Queries other search engines besides just Google.

* Has more search options.

tonysdg · on Nov 13, 2017

Or DuckDuckGo, for that matter?

saas_co_de · on Nov 13, 2017

This is awesome. I have been wanting to build something like this for a while but never had the time.

ReverseCold · on Nov 13, 2017

There's also pears search, which died sometime after being funded by Mozilla. I don't think it was malicious intent, but no one can explain why the project is inactive.