To extend this a bit, the ultimate solution is a quasi-government nonprofit organization along the lines of ICANN (or maybe part of ICANN) that operates a crawler and a database of crawled sites, plus an API for that data.
This would not be a search engine per se but a neutral backend that anybody could build a search engine on top of. Want to build a search engine and sell ads? Fine. Want to build a search engine specializing in health information subsidized by the drug companies? Fine. Want to build one as a neutral nonprofit and charge subscriptions? Fine. The same backend database works for them all. Even Google could build a frontend on top of it.
Yes it would be expensive. The government would have to pay for all the backend infrastructure (whether the government buys it or rents it from Amazon, Google, Microsoft, etc.), and Cloudflare and robots.txt would have to allow "icanncrawler" to access sites without friction. But it would finally allow the creation of neutral search engines not beholden to advertisers. It's a piece of infrastructure the modern Internet sorely needs.
This would not be a search engine per se but a neutral backend that anybody could build a search engine on top of. Want to build a search engine and sell ads? Fine. Want to build a search engine specializing in health information subsidized by the drug companies? Fine. Want to build one as a neutral nonprofit and charge subscriptions? Fine. The same backend database works for them all. Even Google could build a frontend on top of it.
Yes it would be expensive. The government would have to pay for all the backend infrastructure (whether the government buys it or rents it from Amazon, Google, Microsoft, etc.), and Cloudflare and robots.txt would have to allow "icanncrawler" to access sites without friction. But it would finally allow the creation of neutral search engines not beholden to advertisers. It's a piece of infrastructure the modern Internet sorely needs.