Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Web-scraping – Do patterns/recipes exist for common scraping targets?
10 points by kisamoto on Sept 11, 2020 | hide | past | favorite | 7 comments
I'm fairly familiar with web scraping/crawling however I was wondering if there is a company/tool that has re-usable modules for scraping common websites?

Examples could include: scraping article texts from news websites; extracting recipes from Good Food etc.

Rather than rewriting what others have - is there an existing library of these scrapers/crawlers to use 'out of the box'?



Not exactly what you're looking for, but there's a OSS Chrome Extension that allows you to record your actions in browser and transcribes them into Nightmare.js code:

https://github.com/segmentio/daydream

Probably the best you're going to get - most things worth scraping are worth money, and as such are not freely available


For extracting news articles: https://newspaper.readthedocs.io/en/latest/


While not general purpose templates, for news articles that seems exceptionally useful



I know ScrapingHub actually, has some useful tools but as far as I know they don't have a library of off-the-shelf scrapers for popular websites?


What you're looking for sounds very open ended but the closest thing I can think of is the Huginn project on github?


This is sorely needed IMO.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: