Hacker News new | past | comments | ask | show | jobs | submit login

Not necessarily, I have a website with 95% (maybe even more) of the traffic generated by crawlers. If some of them are behaving badly, it is fair to exclude them with my robots.txt.

But of course, the ones behaving badly tend to not respect the robots.txt, so you end up banning the IP or IP block.

And here, I am a nice guy, the crawler must really be a piece of crap for me to start to block.






Deny-listing/banning bad crawlers is fine. Especially if they ignore the robots.txt.

But allow-listing particular crawlers only is collusion.


the parent comment is talking about allow-listing (aka 'whitelisting') just a few crawlers from like, google



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: