Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah this does work as long as the scraper respects robot.text

But dosnt openai and other companies use third party datasets? Like sure they do plenty of scraping but I'd bet for some stuff its cheaper to buy the dataset and then cleanup the data.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: