This is a fair point, however blacklisting isn't necessarily a perfect solution either. It would require continuous manual effort in going through the logs and blocking bad bots, and if some new bot were to misbehave and crawl too aggressively, blacklisting would only help after the fact.
I do think I did make a mistake though. To your point, I shouldn't block crawlers that both behave and are attempting to help my site in some way (by driving traffic to it -- IE search engines). Whether or not they are currently driving traffic to the site is not important. I'll whitelist Yandex, Baidu, Scoutjet and any other related bots I see and edit the post.
Whitelisting a couple of bots now doesn't help at all for any new search engines trying to start up. What are they to do, contact every site admin individually?
> if some new bot were to misbehave and crawl too aggressively, blacklisting would only help after the fact.
In that case, don't blacklist all bots, simply add a crawl delay to any bots that you haven't specifically allowed:
User-agent: *
Crawl-delay: 10
This allows minor bots to continue to crawl the site, while cutting back on bandwidth costs for the couple of ones that are being overly aggressive.
I do think I did make a mistake though. To your point, I shouldn't block crawlers that both behave and are attempting to help my site in some way (by driving traffic to it -- IE search engines). Whether or not they are currently driving traffic to the site is not important. I'll whitelist Yandex, Baidu, Scoutjet and any other related bots I see and edit the post.