I suppose it's a pretty good approach in order to help avoid upsetting website owners and possible lawsuits. While I'm not too sure I agree with it, I can appreciate that they have provided a very easy way for websites to opt-out.
It's pretty shocking how many web designers, even experienced professionals, assume a site isn't being crawled because it hasn't "gone live" yet in their minds (i.e., no press release). If you have a site active on an IP without any access controls, you can almost be sure it is being indexed by someone. If it's not the default site, expect one of your users to leak the virtual host name. If it's SSL-protected, it might even be revealed in the certificate. I respect the work the Internet Archive is doing, but I'm also grateful that they will immediately retroactively apply robots.txt if you discover you foolishly exposed a site prematurely.