Couldn't even hazard a guess, I can send a Backblaze-like chassis if needed, but if there's no cost to pull from the AWS public S3 bucket, I'll just spin up some VMs and enumerate everything into items in the Internet Archive. Trying to find the balance between time, cost, and inconvenience for all parties involved.
still wondering/trying to get a sense of how much data has been accumulated in the open data digitization process. Gleaning that it might be multiple petabytes based on the open data sessions that were going on last few years but dunno...