There's https://easylist.to/easylist/easylist.txt for universal content-blocking... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

hombre_fatal on June 20, 2020 | parent | context | favorite | on: RSS Box – RSS for websites that do not support RSS

There's https://easylist.to/easylist/easylist.txt for universal content-blocking rules, youtube-dl for universal video extraction methods.

While building a feed reader of my own, I had a recent idea for a project for universal content crawling rules: how is the content hierarchy organized on each site and how do you extract it from each content page. A single community project that any other project could use to crawl websites for their content.

Looks like rss-bridge comes close to that.

k1m on June 20, 2020 [–]

To help extract article content, you might be interested in this collection I help maintain: https://github.com/fivefilters/ftr-site-config/

It's used, in addition to an automatic article extractor, in Full-Text RSS: http://ftr.fivefilters.org

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact