Hacker News new | past | comments | ask | show | jobs | submit login

I'd love a GPT based solution that, provided with similar inputs as ones used by scrapeghost, instead of doing the actual scraping, would rather output a recipe for one of the popular scraping libraries of services - taking care of figuring out the XPaths and the loops for pagination.



Why GPT-based then? There are libraries that do this: You give examples, they generate the rules for you and give you a scraper object that takes any html and returns the scraped data.

Mine: https://github.com/lorey/mlscraper Another: https://github.com/alirezamika/autoscraper


Great projects, thank you for the links. On a brief scan neither cover paging/loops - or js frameworks where one would need to use headless browsers and wait for content to load, where a low/lazy code solution might provide the most added value.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: