Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

robots.txt is actually a really usefulay to tell an attacker where to look for juicy content that doesn't want to be indexed, but following it entirely voluntary. It's easy to imagine a dark web search engine that only has that content.

If you want your stuff to exist in the same way, but for OpenAI training, just block GPTBot in your robots.txt

https://platform.openai.com/docs/gptbot



Just a thought, what about a dummy/honeypot path in robots.txt? If any request is made related to that path, block connections from that source?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: