If you've got a site with a lot of pages, bot traffic can get pretty big. Things like a shopping site with a large number of products, a travel site with pages for hotels and things to do, something to do with movies or tv shows and actors, basically anything with a large catalog will drive a lot of bot traffic.
It's been forever since I worked at Yahoo Travel, but bot traffic was significant then, I'd guess roughly 5-10% of the traffic was declared bots, but Yandex and Baidu weren't agressive crawlers yet, so I wouldn't be terribly surprised if a site with a large catalog that wasn't top 3 with humans would have a majority of traffic as bots. For the most part, we didn't have availability issues as a result of bot traffic, but every once in a while, a bot would really ramp up traffic and cause issues, and we would have to carefully design our list interfaces to avoid bots crawling through a lot of different views of the same list (while also trying to make sure they saw everything in the list). Humans may very well want to have all the narrowing options, but it's not really helpful to expose hotels near Las Vegas starting with the letter M that don't have pools to Google.
I appreciate the response but I’m still perplexed. It’s not about the percent of traffic if that traffic is cached. And rate limiting also prevents any problems. It just doesn’t seem plausible that scrapers are going to DDoS a site per the original comment. I suppose you’d get bad traffic reports and other problems like log noise, but claiming it to be a general form of DDoS really does sound like hyperbole.
It's been forever since I worked at Yahoo Travel, but bot traffic was significant then, I'd guess roughly 5-10% of the traffic was declared bots, but Yandex and Baidu weren't agressive crawlers yet, so I wouldn't be terribly surprised if a site with a large catalog that wasn't top 3 with humans would have a majority of traffic as bots. For the most part, we didn't have availability issues as a result of bot traffic, but every once in a while, a bot would really ramp up traffic and cause issues, and we would have to carefully design our list interfaces to avoid bots crawling through a lot of different views of the same list (while also trying to make sure they saw everything in the list). Humans may very well want to have all the narrowing options, but it's not really helpful to expose hotels near Las Vegas starting with the letter M that don't have pools to Google.