With fewer and fewer printed works, I wonder how future archeologists will discover human knowledge from our era. Especially after a world-wide catastrophe. Human language written in the physical world seems more durable long-term than the electrons in a flash drive.
I feel like it's the exact opposite. We're drowning in data.
Consider: the Library of Congress has (literally) a copy of every work that has ever been in print in the U.S. The Internet Archive has archived much of the public internet. There are other resources like arXiv, Zenodo, etc. Many of these organizations are very conscientious about data retention and longevity.
Is there even more data that could be saved in principle? Of course. But the amount of data we're talking about is already so much dramatically larger than what we have about, say, the antiquities, that it seems like an absurd comparison. The problem in the future (excluding extinction-level events) will be wading through that data, which even in the handful of resources I've mentioned above will be so far beyond any single human's ability to understand that it's just unfathomable.
We're drowning in data in the present. But I predict that five hundred years from now these will be known as 'the lost years of history' because there wasn't enough long term durable storage dedicated to preserving events that we do not think worth preserving today.
But you'll have a million AI generated recipes for Apple Pie from spam websites of the era to console you.
I'm generally pessimistic but I don't think this is true. We're still generating print information at an astounding rate. Plenty of organizations work on archiving information in various forms.
Historians will have plenty of information to sift through from this era in 500 years. Will they have a complete collection of ACM Journals? Perhaps not. Will they have ample information to get a clear picture of society from this era, and clear timelines of events, etc.? I would say yes, better than any other time so far.
I think people conflate "a lot of information won't survive in 500 years" with "we're going to lose everything."
I have literally thousands of pictures on my phone / in cloud storage. Odds are none of them will survive 500 years. That's OK, from a historical perspective -- 95% of them are cat pictures or memes anyway. 5% might document something interesting if it was all a historian had to puzzle together a picture of what life was like in the 2020s.
But it won't be, because we're literally producing trillions of digital artifacts. If even 1% (or probably even .1% or .01%) of those survive they'll have a richer visual representation of the 2020s than we have of other times in history.
(Whether we'll have any historians or humans in 500 years time, that's the real question.)
I'm not saying that we are not generating print information. I'm saying that we have a very low signal to noise ratio to the point that even if we do have a lot of information the chances that the 'good stuff' will be preserved are getting smaller by the day to the point that it will be essentially drowned out by junk unless we take special measures in the present.
This is exactly the problem; information isn't necessarily being lost, but silted over and sometimes intentionally buried. Part of it is due to its natural loss of relevance, part of it is due to loss of popularity and attention, and part of it is deliberate commercial motive.
Information is being lost constantly on the internet. Whether it is CNET deleting old articles as an SEO tactic[0], domain names expiring, formerly popular sites like Geocities being erased altogether, Google mismanaging its Usenet archive or once popular blogs getting deleted for TOS or account inactivity issues, the internet is certainly not forever. Archive.org isn't really a solution either because it is not uncommon for domain squatters to use a robots.txt setting to get them to remove the domain from the Wayback Machine. You can't even rely on large social media platforms because people delete their accounts, some people auto-delete their old social media posts and platforms decide to login-wall themselves like what happened with Twitter.
Link rot is a major problem that people don't recognize, especially for information that was only ever online. Most of the obscure web sites I used to read and hang out on are gone and many of the things I remember are now completely unverifiable because I didn't save a copy of every web site that ever influenced me.
My own unfinished game project from my teenage years vanished from the internet without a trace after I lost interest and I lost all of the code along with all of my other data from my teen years in a hard drive crash around the time I finished high school. My mods I made for games and never distributed are sitting on old laptops in my closet I haven't even turned on in years that may or may not even work anymore. I imagine everybody else who's been heavily online in the past has similar stories of just how ephemeral digital information is.
> Archive.org isn't really a solution either because it is not uncommon for domain squatters to use a robots.txt setting to get them to remove the domain from the Wayback Machine.
Do they delete it? My understanding is that they simply unpublish it— Lost from the Internet, then, but not necessarily forever.
> and I lost all of the code along with all of my other data from my teen years in a hard drive crash around the time I finished high school.
Technically that data wasn't lost for good either with the hard drive crash. Provided there's an academic, personal, economic, cultural, etc. incentive to read it, I'm sure any old inflation-adjusted $50 magnetic microscope from the year 2080 would have been able to get it all back in a matter of moments.
Overall, I agree with your point. LOCKSS (the principle, not the project) and KISS, and checksum and ECC, etc. HD-Rosetta/NanoRosetta's cool but doesn't seem super scalable or readable, MDisc was exciting but was also a market flop, and Memory of Mankind's ceramic tablets and the Arch Mission Foundation's glass hologram thingies have even bigger practicality problems— For now, so long as digital storage availability increases exponentially, you can probably just spin up Borg or something and keep accumulating backups of old files indefinitely.
But overall, anything that you don't actively invest the overhead to save can be assumed to be lost.
> information isn't necessarily being lost, but silted over and sometimes intentionally buried.
And that kind of touches disinformation campaigns. There is a lot of noise being deliberately and maliciously added so that it out represents any information someone wants to suppress. An AI model trained on this corpus will have all the wrong ideas.
>> With fewer and fewer printed works, I wonder how future archeologists will discover human knowledge from our era. Especially after a world-wide catastrophe. Human language written in the physical world seems more durable long-term than the electrons in a flash drive.
> Consider: the Library of Congress has (literally) a copy of every work that has ever been in print in the U.S. The Internet Archive has archived much of the public internet.
Those are both very centralized.
The benefit of print media is its forced-decentralization and stand-alone nature. It can survive for 100+ years on a shelf, and I don't think any digital data format could match that.
> Is there even more data that could be saved in principle? Of course. But the amount of data we're talking about is already so much dramatically larger than what we have about, say, the antiquities, that it seems like an absurd comparison.
That sheer amount of data may work against preservation in another way: too much noise distracting from identifying and preserving the valuable stuff.
> The problem in the future (excluding extinction-level events) will be wading through that data, which even in the handful of resources I've mentioned above will be so far beyond any single human's ability to understand that it's just unfathomable.
Not necessarily: all that data is very fragile. Besides format conversion and migration issues (about that flash-based website...), you have site/platform longevity, and the time-bomb of cloud hosting costs (someone stops paying a bill? Poof, it's gone). Then you have the spectre of a dark age: a few decades of a teetering industrial base after a major world war that destroys a couple of high-tech manufacturing centers, and that data is pretty much all gone.
> The benefit of print media is its forced-decentralization and stand-alone nature. It can survive for 100+ years on a shelf, and I don't think any digital data format could match that.
MDisc definitely can, if its claims are to be believed. Maybe you can find some specialized tape too?
Hard disks and normal optical disks probably could too, as long as you have a sufficiently large redundant array of them with error-detection on your bookshelf— If it can't last that long, just keep doubling the number of starting drives until it can.
If you want to get fancy, you can try to pay for something around HD-Rosetta, or Microsoft/Hitachi/Soton's printed glass tech.
And then of course if you wanna get simple, you could always stick your digital data into a mountable book/scroll of printed QR codes (or equivalent).
Of course, none of these options are actually deployed at any significant scale.
But I'm not sure you can realistically expect any arbitrary book to last for centuries either. IIRC Modern print media uses acidic paper as a cost-cutting measure, causing it to become brittle and crumble over time…
"[…] causes huge losses in library and archives collection […] 90% of the resources published by the mid-1990s […] have all the features of acidic paper. […] established to care for the heritage of the past, are not able to effectively carry out their mission […] it is not possible to save all the documents from the 19th and 20th centuries […] In recent years, most books have been printed on acid-free paper […]".
>> The benefit of print media is its forced-decentralization and stand-alone nature. It can survive for 100+ years on a shelf, and I don't think any digital data format could match that.
> MDisc definitely can, if its claims are to be believed. Maybe you can find some specialized tape too?
1) I don't think they actually make real M-Discs anymore (the DVDs). There are some M-Disc branded BD-Rs, but I don't think those use any different technology than standard BD-Rs.
2) They're not stand-alone, and by that I mean they need a player or they're unreadable. IMHO, it's unlikely there will be many working DVD/Blu-ray players in 100 years, and very unlikely there will be any players for "some specialized tape."
I've been working a bit with VHS recently, and it's eye opening to see a once ubiquitous format start to drift into unreadability (it's not there yet, but a lot of the specialized equipment needed to do a good job is becoming expensive, hard-to-find, and breaking down).
> Consider: the Library of Congress has (literally) a copy of every work that has ever been in print in the U.S.
This is not true. The Library of Congress has a copy of works that have been registered for copyright with them. I am sure millions of works are created in the US every year that never get registered with the Library of Congress.
While a registration with the LOC isn't a bad idea, in the USA at least authors of written works automatically have copyright of their work, regardless of whether they have registered it for copyright with the LOC.
A single EMP event, like the Carrington Event, or just a single launch of something like Starfish Prime, could wipe all of that out in the blink of an eye.
Something like this occurring seems inevitable given a long enough timeline, and 500 years doesn't seem that far away.
On the other hand electronic versions can be copied over and over. If you think about it, we have preserved many ancient text that have been copied again and again, like biblical texts, greek philosophers, etc. Even though the originals are long gone. And those were copied by hand.
I think that copying over on new media will be much safer, particularly as the cost of storage is ever cheaper, and a cheap consumer drive today can store many times all the books that have been written in the world to date in text format.
Yeah. It's probably a data management and cultural problem, not so much a technological one.
It would be comparatively easy to maintain historical continuity of old data over time, as long as storage costs keep decreasing. Could also have a system like Git or Fossil, that automatically retains history, or just move stuff to an `archive.subdomain` when you're done hosting it. The Internet Archive's budget was only $36 million in 2019; it seems like it'd also add barely a rounding error for any government or large corporation to sponsor it as a safe place to park data long-term.
The problem is that people don't always copy old data to new media, or make redundant copies of current data, and do the other things you need to do to keep digital data safe. Instead we leave it sitting on old hard drives that are slowly rotting away, or simply remove/unpublish/delete it when it's no longer getting enough views. And this is exacerbated by the fact that the systems we use don't provide easy and scalable ways to preserve data by default, so it incurs extra effort and cost to do so.
Society-level information retention and integrity doesn't directly help push out new monetizable products, so there's little economic incentive for anybody to care about it.
The culturally important stuff will probably survive, because people will copy that, just as has always been the case. But as was also the case through history, much more will also be lost, which kinda sucks, especially given the sheer amount of information produced and at risk these days, and the thought that we probably could save all or most of it given modern technology.
> The problem is that people don't always copy old data to new media, or make redundant copies of current data, and do the other things you need to do to keep digital data safe
but how many selfies from medieval era do you want to preserve? Instagram is receiving 10s of millions new photos a day.
I'm more worried about preserving information for today rather than tomorrow. E-books you own on subscription services are altered without your consent and can easily recalled if deemed inappropriate.
Large size dvd drives also work today. But again, still need power and a computer/interface etc. Mesh network + reader device partially solves some of those issues, but it doesn't remove the need for fancy setups just to be able to read something.
Even if we lose the ability to build readers, the understanding that these digital mediums contain valuable information will never be lost. It isn’t rocket science to build a room sized dvd reader if needed
> It isn’t rocket science to build a room sized dvd reader
It is. We have built rockets for 700+ years, a V2 reached space in 1944 and we went to the moon over half a century ago, over a decade before even the compact disc (1982).
If society collapses completely, I think it will be decades, possibly centuries before we even consider trying to read these. At that time, many DVDs might not be readable by DVD players anymore.
In such a catastrophe it is unlikely that preserving printed works would be a priority and would bring into question the very existence of 'future archeologists'. Archeology is a consequence of an age of abundance that would be gone in such a scenario.