Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> AFAIK OpenAI currently respects robots.txt

I wonder to what degree -- for example, do they respect the Crawl-delay directive? For example, HN itself has a 30-second crawl-delay (https://news.ycombinator.com/robots.txt), meaning that crawlers are supposed to wait 30 seconds before requesting the next page. I doubt ChatGPT will delay a user's search of HN by up to 30 seconds, even though that's what robots.txt instructs them to do.



Would ChatGPT when live interacting with a user even have to respect robots.txt? I would think the robots.txt only applies to automatic crawling. When directed by a user, one could argue that ChatGPT is basically the user agent the user is using to view the web. If you wanted to write a browser extension that shows the reading time for all search results on google, would you respect robots.txt when prefetching all pages from the results? I probably wouldn’t, because that’s not really automated crawling to me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: