Hacker News new | past | comments | ask | show | jobs | submit login

There's https://easylist.to/easylist/easylist.txt for universal content-blocking rules, youtube-dl for universal video extraction methods.

While building a feed reader of my own, I had a recent idea for a project for universal content crawling rules: how is the content hierarchy organized on each site and how do you extract it from each content page. A single community project that any other project could use to crawl websites for their content.

Looks like rss-bridge comes close to that.




To help extract article content, you might be interested in this collection I help maintain: https://github.com/fivefilters/ftr-site-config/

It's used, in addition to an automatic article extractor, in Full-Text RSS: http://ftr.fivefilters.org




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: