Hacker News new | past | comments | ask | show | jobs | submit login

Why GPT-based then? There are libraries that do this: You give examples, they generate the rules for you and give you a scraper object that takes any html and returns the scraped data.

Mine: https://github.com/lorey/mlscraper Another: https://github.com/alirezamika/autoscraper




Great projects, thank you for the links. On a brief scan neither cover paging/loops - or js frameworks where one would need to use headless browsers and wait for content to load, where a low/lazy code solution might provide the most added value.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: