Is there a reason you feel that that file will be respected?

kinduff · 2024-06-03T18:51:33 1717440693

Because it's on their documentation. If OP had the file and entry, and they didn't respect it, then it would be another conversation.

lambdaxyzw · 2024-06-03T18:52:41 1717440761

Is there a reason why you don't? Is it just general bitterness and cynicism? As far as I know all major search engines respect rebots.txt, I don't see why LLM scrappers would be different.

dylan604 · 2024-06-03T19:11:24 1717441884

probably, yes. But coming from LLM scrappers, I have absolutely no faith in any of them. When one of them calls themself "open" in their name and is anything but, why would I trust them for anything after they lie in their name?

I also do not trust Google only crawls what is allowed in robots.txt. Maybe they only use the data allowed in public use, but I have no faith that they don't have crawled data in their version of shadow profiles.

I do not trust bigTech at all, and for those that do, I really don't understand why you do.

kemayo · 2024-06-03T18:53:22 1717440802

Bots from big companies like Amazon, which is who the author is complaining about, do tend to respect it. In fact, it's listed in their documentation that the GP linked to that they will. They could be lying -- but why bother?

fragmede · 2024-06-03T19:01:43 1717441303

Amazon's official documentation for Amazonbot, at https://developer.amazon.com/amazonbot

states

> Amazonbot respects standard robots.txt rules.