Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wouldn't be surprised if the indexed subset of Facebook alone were more than 1000x larger than all of the indexed web 20 years ago. The web in general has probably expanded many millions or hundreds of millions of times.


Personally I wouldn’t mind if trash/spam sites like Facebook/Twitter were omitted from the database. As well as non-English content, being as though I only speak English. Remove trash/spam/non-english from the db and the size of that 300TB will be cut down substantially to the point it is feasible for a single person to store. After all, even if somebody wanted to store the whole 300TB db would cost about $4000 in hard drives which is not as totally out-of-reach as some people here are making it seem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: