Hacker News new | past | comments | ask | show | jobs | submit login

> - Scraping without a contact method, or at least some unique identifier (like your project's codename), in the user agent string.

This is a very effective way to make sure you won't get any scraping done!






Tell that to Googlebot, Bingbot, Petalbot, SemrushBot, MJ12bot, MojeekBot, DotBot, YandexBot, SeznamBot, Barkrowler, AhrefsBot, DuckDuckBot, AcademicBotRTU, Bytespider, Applebot, ZoominfoBot, TelegramBot, TwitterBot, SemanticScholarBot, redditbot, Pinterestbot... From a quick peek at my access log, all include either a link (most) or an email address (zoom, tiktok/bytedance, dotbot, and that academic bot)

Very few individual bots don't follow this good practice. Most of the IP ranges of violating bots are owned by Huawei (a few is Huawei Cloud so it could be anyone, but the majority seems to be Huawei themselves) and the remainder is all small beans as far as I remember (few thousand accesses in a day and then disappear forever, for example)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: