Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A search engine based on RSS feed (github.com/dato-ai)
134 points by daviducolo on Sept 15, 2022 | hide | past | favorite | 23 comments



Love the idea behind this.

Generally any new way of attempting to find signal among the noise of the internet is good, and I think RSS is niche enough to automatically remove a ton of noise from the dataset, and the fact this will mostly target articles and blogs gives it a distinct flavor.

Gonna put this with the marginalia astrolabe on my "small web search" bookmark folder.


What other stuff do you have in this 'folder'?


Not to disappoint, but currently Marginalia and now this. (It used to just be Marginalia, no idea why it was a folder)

Though now that there's two I'm probably gonna pick up a few more from here that I like: https://seirdy.one/posts/2021/03/10/search-engines-with-own-...


I like this idea a lot but this specific project is done pretty badly. There's a lot of spam and noise in results, and no real filtering system in place.

It would be awesome to create something like this from a massive archive of personal blogs in spaces like tech, development, design, etc. Basically, a massive RSS reader curated by the community itself.


I was very excited by this and it's promising, but it needs a bit more work to be usable. I was hoping the RSS technique would help avoid the SEO and blogspam that Google is now filled with. Unfortunately, I still get quite a bit of that (my query was "espresso machine" and I got a bunch of listicles, etc).

I think applying a layer of curation on top would go a long way to fixing this.


quick take, wouldn't it potentially compound the blogspam? Probably one of the largest providers of RSS these days are WordPress installs.


Yeah,and most SEO spam blogs are WP powered.


That's nice. Will it start crawling for new feeds or they will be manually curated?


each user can add the feed they prefer then the system will automatically import the data daily


Great! Also, good insight for basing a search engine on RSS. It is naturally spam resistant: SEO people and other spammers don't like feeds because they cannot bring them clicks and ad views, so no incentive for spam.


This is indeed the dominant view so the observation is true, but the history here baffles me; web feeds aren’t required to include the full content. Ad-funded publishers could easily just include links and descriptions—the same as they’re content to do for Open Graph previews—and continue to benefit from web feeds instead of apparently writing off the technology entirely.



Very cool discovery, and your biztoc.com work as well!


very nice!


Super cool project!

I tried solving the search quality problem (for technical content like engineering blogs) a while back by filtering using heuristics based on websites/urls. At one point I was experimenting with RSS, but found that many websites only show the past X number of entries and gave up that direction since it excluded too much content.

Are you seeing a similar problem now?


Did you get it running?


Fantastic, great idea! The amount of job offers when searching for 'software engineering' is a bit annoying, though.


The major problem is the selection of the RSS feeds. If you look at the list of feeds you see a lot of traditional news feeds like CNN or New York Post. These are well covered in Google/Google News and are fairly low quality.


yes you are right but the list is chosen by the users. I am working on an algorithm for ranking the entries in order to get the quality results


> At the moment is not possible to add source Feed if you have feed proposals open an issue with the URLs to add

What are the long-term plans for this? Building a crawler? Or just manual curation?


it is just a crawler


so far a few searches haven't shown one relevant result, and a lot of strange spam


awesome, very cool and very fast!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: