Hacker News new | past | comments | ask | show | jobs | submit login

How much data do you think it encompasses? Approaching/surpassing Petabytes?



Couldn't even hazard a guess, I can send a Backblaze-like chassis if needed, but if there's no cost to pull from the AWS public S3 bucket, I'll just spin up some VMs and enumerate everything into items in the Internet Archive. Trying to find the balance between time, cost, and inconvenience for all parties involved.


still wondering/trying to get a sense of how much data has been accumulated in the open data digitization process. Gleaning that it might be multiple petabytes based on the open data sessions that were going on last few years but dunno...


Some of the ppl interested in the data and a lead at AWS got back to me on it:

aws query on the open dataset comes back with:

Total Objects: 4649789

Total Size: 312.5 TiB!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: