Even with a pool of proxies, I would expect an instance of this "metasearch engine" to quickly get banned by the other search engines. The same IP running thousands of queries and scraping its content (which is against their ToS) should be easily detectable.
Building on that issue, I'd like to add that it would be nice to have a feature that alerts a user that certain a search engine is denying requests. It's visible in the logs or settings somewhere, but usually I find myself wondering for a while why my search queries aren't accurate before heading off to figure out why.
At least for me, next to each result is a list of the engines that returned that result. I run searx through Tor, so I occasionally find that Google stops returning results for a few minutes.
It doesn't happen often, but it's easy to tell when it does because none of the first page results have "google" next to them, while of course normally most of them would.
By paying for API access to other search engines. Yahoo used to offer that publicly but eventually shut the service down (but kept DDG as a legacy customer)
This is self-hosted, so I'm assuming it's running under the assumption that each person hosts their own instance and uses that instance. The number of queries coming from the instance in that case wouldn't look too out of the ordinary.
I also run searx self-hosted, configured to proxy all its queries through Tor. Occasionally one of the engines doesn't return results (probably due to blocking), which is barely noticeable since several others still work, but normally all the engines including Google return results.
Since searx doesn't store cookies returned by the search engines, and I'm using it through Tor, I think this is a significant improvement over sending all my search queries to Google directly from my laptop.
I can't grasp how you got that idea. Do you not know what self-hosted means?
the engine craws the web and saves its data locally.
this locally saved data can be queried/searched.
So yes, in your search engine, there will only be your own searches. But these searches are only visible to your own servers/services.
It will also sometimes ban you completely (not even the CAPTCHA works, solving it just gets you another) for ~2h. I've triggered it manually, usually when trying very specific queries and multiple variations in quick succession and also going through to the "end" of the result pages.
I wouldn't worry about it. If they get banned, they will probably apply some ML technique to circumvent any CAPTCHAs to get access again. Also, this can run from the user's computer so it would actually be quite hard to detect that the results are being aggregated, and stripped from ads.
I like this part "it draws its results from its own bot that crawls the web". There aren't too many that use their own bot.
Tsignal (https://deepsearch.tsignal.io/) is another that uses it's own bot with a little AI tossed into the mix. And... it's currently not accessible:(
Wanted to add that I'm currently on Opera and using an extension named Search All[0]. After conducting a search via your default search engine, this extension places a bar with a list of user configurable search engines. This allows you to search alternatives easily using the same keywords.
One great feature is that if you go directly to a search engine, click on the Search All icon, it almost always identifies it with correct parameters and can be easily added. Just added findX to my bar (for testing).
I plan on going back to Firefox and wish FF had something like this (part of the reason I'm posting this).
There's also pears search, which died sometime after being funded by Mozilla. I don't think it was malicious intent, but no one can explain why the project is inactive.