Hacker News new | past | comments | ask | show | jobs | submit login

This is exactly the problem; information isn't necessarily being lost, but silted over and sometimes intentionally buried. Part of it is due to its natural loss of relevance, part of it is due to loss of popularity and attention, and part of it is deliberate commercial motive.



Information is being lost constantly on the internet. Whether it is CNET deleting old articles as an SEO tactic[0], domain names expiring, formerly popular sites like Geocities being erased altogether, Google mismanaging its Usenet archive or once popular blogs getting deleted for TOS or account inactivity issues, the internet is certainly not forever. Archive.org isn't really a solution either because it is not uncommon for domain squatters to use a robots.txt setting to get them to remove the domain from the Wayback Machine. You can't even rely on large social media platforms because people delete their accounts, some people auto-delete their old social media posts and platforms decide to login-wall themselves like what happened with Twitter.

Link rot is a major problem that people don't recognize, especially for information that was only ever online. Most of the obscure web sites I used to read and hang out on are gone and many of the things I remember are now completely unverifiable because I didn't save a copy of every web site that ever influenced me.

My own unfinished game project from my teenage years vanished from the internet without a trace after I lost interest and I lost all of the code along with all of my other data from my teen years in a hard drive crash around the time I finished high school. My mods I made for games and never distributed are sitting on old laptops in my closet I haven't even turned on in years that may or may not even work anymore. I imagine everybody else who's been heavily online in the past has similar stories of just how ephemeral digital information is.

[0]: https://www.theverge.com/2023/8/9/23826342/cnet-content-prun...


> Archive.org isn't really a solution either because it is not uncommon for domain squatters to use a robots.txt setting to get them to remove the domain from the Wayback Machine.

Do they delete it? My understanding is that they simply unpublish it— Lost from the Internet, then, but not necessarily forever.

> and I lost all of the code along with all of my other data from my teen years in a hard drive crash around the time I finished high school.

Technically that data wasn't lost for good either with the hard drive crash. Provided there's an academic, personal, economic, cultural, etc. incentive to read it, I'm sure any old inflation-adjusted $50 magnetic microscope from the year 2080 would have been able to get it all back in a matter of moments.

Overall, I agree with your point. LOCKSS (the principle, not the project) and KISS, and checksum and ECC, etc. HD-Rosetta/NanoRosetta's cool but doesn't seem super scalable or readable, MDisc was exciting but was also a market flop, and Memory of Mankind's ceramic tablets and the Arch Mission Foundation's glass hologram thingies have even bigger practicality problems— For now, so long as digital storage availability increases exponentially, you can probably just spin up Borg or something and keep accumulating backups of old files indefinitely.

But overall, anything that you don't actively invest the overhead to save can be assumed to be lost.


> information isn't necessarily being lost, but silted over and sometimes intentionally buried.

And that kind of touches disinformation campaigns. There is a lot of noise being deliberately and maliciously added so that it out represents any information someone wants to suppress. An AI model trained on this corpus will have all the wrong ideas.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: