How much data do you think it encompasses? Approaching/surpassing Petabytes?

toomuchtodo · on Feb 26, 2020

Couldn't even hazard a guess, I can send a Backblaze-like chassis if needed, but if there's no cost to pull from the AWS public S3 bucket, I'll just spin up some VMs and enumerate everything into items in the Internet Archive. Trying to find the balance between time, cost, and inconvenience for all parties involved.

ChrisArchitect · on Feb 27, 2020

still wondering/trying to get a sense of how much data has been accumulated in the open data digitization process. Gleaning that it might be multiple petabytes based on the open data sessions that were going on last few years but dunno...

ChrisArchitect · on Feb 28, 2020

Some of the ppl interested in the data and a lead at AWS got back to me on it:

aws query on the open dataset comes back with:

Total Objects: 4649789

Total Size: 312.5 TiB!