Hacker News new | past | comments | ask | show | jobs | submit login

Is there a reason you feel that that file will be respected?



Because it's on their documentation. If OP had the file and entry, and they didn't respect it, then it would be another conversation.


Is there a reason why you don't? Is it just general bitterness and cynicism? As far as I know all major search engines respect rebots.txt, I don't see why LLM scrappers would be different.


probably, yes. But coming from LLM scrappers, I have absolutely no faith in any of them. When one of them calls themself "open" in their name and is anything but, why would I trust them for anything after they lie in their name?

I also do not trust Google only crawls what is allowed in robots.txt. Maybe they only use the data allowed in public use, but I have no faith that they don't have crawled data in their version of shadow profiles.

I do not trust bigTech at all, and for those that do, I really don't understand why you do.


Bots from big companies like Amazon, which is who the author is complaining about, do tend to respect it. In fact, it's listed in their documentation that the GP linked to that they will. They could be lying -- but why bother?


Amazon's official documentation for Amazonbot, at https://developer.amazon.com/amazonbot

states

> Amazonbot respects standard robots.txt rules.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: