I've been noticing this a lot lately in one of my little hobbies of Halloween events. There's so much history from events in the early 2000s that has been lost. So many links in ancient web forums that 404 and aren't in the Internet archive. Info from the 2010s is much better but still missing a lot of media. As an individual, I certainly don't have the resources of some of the organizations listed in this article but I've been trying to do what I can and mirror important sites. I dread seeing what the future looks like as more and more of the communities in my niche move onto discord and away from more traditional web forums.
ETA: The issue I face is two-fold. More pressingly, there are files just missing. Nothing I can do about that barring someone magically having them downloaded to an old PC. But also interesting is working with old formats. The 2000s/early 2010s Halloween Horror Nights websites were all written in Flash. They have tons of little Easter eggs and information for the event obsessives like me. Between fan backups and the Internet archive, the files are pretty complete thankfully. But since Flash has been dead for a while, I have to rely on the Ruffles Flash emulator to get it running on the web. But that doesn't work super well. On my list of things to do is to contribute to the project to try to get some of the files working.
In case anyone is curious about my backup of the event or if you're also a fan and have any of the files listed to share and archive, my site is hhncrypt.com!
Just yesterday I went searching for a memorial page for someone who died in 2005. It's still been maintained ever since then, and I was grateful to read through it again.
Even a single-file website with 2 photos can be important to somebody. Thanks.
Install Firefox ESR 52 (or one of various Palemoon offsprings still supporting Netscape plugins) and Flash Plugin 32.0.0.371. That's it, you have a dedicated browser with Flash working the exact same way it worked. As long as that software remains compatible with newer operating systems, you don't even need to start a virtual machine. Of course, flash content that requires data from external servers won't work without them (can sometimes be fixed with a proxy server and manual fiddling), and some old Flash versions can be partially incompatible with final plugin, but that's a deeper dive into that technology.
I live in a small country with a weird language and with a lot of our local pop music...
The new music is fine, everything is on youtube.... for now! ... but the older songs are disappearing.
Years ago, we had a bunch of "mp3 sites" (websites where you could download pirated mp3s), that disappeared, a huge local torrent tracker, focusing also on local stuff, that lost most of the data in the OVH datacenter fire and slowly died, and some stuff was put on youtube 15 years ago, and then got copyright claimed 5 years ago, and was removed. The music groups don't exist anymore, so the CDs aren't publushed anymore, second hand shops are very rare and cater mostly to LPs and tourists, and the groups stopped existing before they could sign deals with the publisher for streaming services.
So yeah, it's not just "those semi-personal photos taken on a party and kept by maybe one or two people", but also pop-music that used to be on the radio all the time in late 80s, early 90s, and is just..gone!
I'm sure that there are data hoarders somewhere, that have mp3s of all those songs somewhere, but unless we get some p2p type of service like gnutella/ed2k/kazaa working again, I won't be able to find them anymore.
"I'm sure that there are data hoarders somewhere, that have mp3s of all those songs somewhere, but unless we get some p2p type of service like gnutella/ed2k/kazaa working again, I won't be able to find them anymore."
Such archives would need to be kept online though. If only archived (or hoarded), it's just data in a box that no-one can access.
Current internet is really poor at handling this long tail. Eg. a movie can be streamed & torrented, millions see it, many have it in personal archives, but 1 year later the torrent swarm has died out, and those personal archives of the downloaders aren't online. And then copyright holder pulls it from their streaming service.
Result: nowhere to be found. Even though popular not long ago.
Copies in personal archives don't count for much if others can't access that data.
For a long time, “So This is Paris” (1954 starring Tony Curtis) was my white whale, as I could not find a copy anywhere, legally or otherwise. I had seen a clip that somebody had posted on youtube, and wanted to see the full movie. The clip was taken down by a copyright claim, and still the movie was unavailable.
Eventually, after searching on and off for a few years, I found a copy on a Russian streaming site, but the movie has effectively been erased from availability.
When I was in university late 90s early 00s, my friends and i had all kinds of music and tv clips downloaded from p2p sharing, that I have since lost and can't find. Clips from shows, remixes of songs. Like you say, somebody must have them, but they're not findable anymore.
I recently logged into Soulseek for the first time in about a decade, and I was unable to find lots of stuff that was commonly shared in the early millennium. It’s no secret that the generation most interested in audio filesharing is graying, and as many people raise families and have less and less time for obsessive music collecting, they fall away from the scene.
I love to see more interest in the topic of digital continuity.
My philosophy around this is "File over app" — if you want to create digital artifacts that last, they must be files you can control, in formats that are easy to retrieve and read. Use tools that give you this freedom.
In the fullness of time, the files you create are more important than the tools you use to create them. Apps are ephemeral, but your files have a chance to last.
I agree with “files over apps.” I used google docs for my personal notes until one day I had shoddy internet and realized how awful it is to lose access to something important that you wrote down.
I now use iA Writer which allows you to save your notes in markdown from iPhone, syncs via iCloud, and then I can continue from my laptop. Admittedly the native Notes app has some better functionality around sharing and searching. However, Notes saves into a SQLite db, which would be fine, but it’s not trivial to view the tables/schema in there.
I think there is something to be said about what is worth archiving. I don't know what it is that should be said though. It seems weird to me that as a society we might be saving things such as a 10 hour video of white noise for another 100 years rather than some personal blogs. What is and isn't worth saving?
It will probably end up being the most popular things, the most viewed or read. More copies of it, more likely to be archived.
This reminds me of books. I’m sure the majority of books from over a hundred years ago are lost because they weren’t popular. We haven’t really noticed their absence…
> I’m sure the majority of books from over a hundred years ago are lost
Especially if you include independently published books that weren't widely circulated. I wonder what percentage of total books this is.
My grandfather published a book before he passed away. It was never sold online or in any big retail stores. Once the last hard copy is lost, it's gone forever.
That's interesting, many countries have laws(or customs) to submit everything published in a form of a book to the national library. I know here in Poland this is done too, because my partner was having her book published by a small publisher and "providing a copy of the book to the national library" was one of the publisher's responsibilities. I have no idea if it's a law or just a custom.
> I’m sure the majority of books from over a hundred years ago are lost
If they were in one of the university libraries that Google scanned, they're not "lost." But you're right; you can't read them. Congress should mandate that the Library of Congress, at least, get a copy to preserve them for the ages.
When you publish a book or magazine in France you’re required to give 2 copies to the national library for archive purpose. Doesn't something like that exist in other countries?
It certainly does in Spain. We even extended it to videogames, although I don't know how much that achieves when so many games are barely playable before the first few patches, have much of their content released in future updates and many are unplayable after the servers close.
Not all books in the state libraries are equal. Historical copies and popular authors (popular among researchers, a much bigger set already) are exhibited and get attention, John Doe's book of family recipes gets sent to some giant dark warehouse people rarely visit.
It is easy to forget that it is an 18th century solution born from 18th century approach to knowledge. Back then, bibliographies of everything printed in certain year in certain country could be compiled, and they were supposed to be more than just lists, to help other men of books keep up with Progress.
Apparently it was required by the Library of Congress in the U.S., but the Supreme Court might have nixxed that because of the Constitution's 4th Amendment (must be reimbursed if required to turn over property).
This is pretty much the best marketing for IPFS. The availability and number of backups of any price of data is directly correlated with the number of people that use it.
For example, if LLM NN model weights are distributed with IPFS instead of corporate infrastructure (basically zero redundancy) the popular models would be very available, and have essentially near zero chance of being lost.
To state that again, the llama models likely have tens of thousands of downloads, which would mean tens of thousands of servers and backups of the data, versus what we have now, which is essentially just one.
We need IPFS for data distribution. Tightly knit integration with git repos is an obvious match as well.
I'm sure that I don't know what you mean. My employer is on a paid plan for Slack, and searches cover everything, as far back as I wish to go. Are you thinking of the limitations on the free license?
Well, I would call that rather disingenuous, because "free trial Slack" is certainly not the default for businesses which actually depend on it, and it's not about search capability at all: it's about retention of data.
It would be very interesting for us to learn about something in ancient Egypt that is equivalent to white noise today. I think the issue of storage should be solved and made abundant. We should not be worrying about what to save, but rather what if we cannot save.
AI is going to give a lot of that data that would have otherwise died eternal life. As the tech evolves, businesses will be able to monetize by selling data (for walled gardens) or their pages will all be scraped, cleaned up and resold by multiple orgs (for stuff on the open web).
I think there is something to be said about what is worth archiving. I don't know what it is that should be said though. It seems weird to me that as a monastery we might be saving things such as a 10 volume satyrical piece about a forgotten greek tyrant for another 100 years rather than some personal thoughts of an Egyptian philosopher. What is and isn't worth saving?
I think it is important to focus on portability and archivability of our online spaces.
Using today's technology, it is possible to allow exporting all content as basic file formats such as txt + zip.
In addition, PKI (public/private key infrastructure) allows us to decouple a user's private and public identities, meaning the public identity can now be portable between servers.
What does all this mean for the average user or community operator?
It means your community can be completely transparent, auditable, and PORTABLE, allowing any user to archive the whole thing and clone it to another server -- right away or years later.
I've been writing a framework for this type of system for several years now, and if you're curious, you know where to look.
What worries me isn't so much that data is lost, or even is thrown away, but rather it's being actively buried because it's hidden behind a commercially-driven search-based interface that deprioritizes old, information-rich but less-entertaining and certainly less-profitable media. Searching for anything has regressed to the mean of what SEO and ad dollars want to show you.
Looks like the article in question is getting the hug of death. But information is being lost in droves even today. Entire YouTube channels can vanish overnight, for various reasons, taking years of content with them. Without any backups floating around out there this stuff is gone forever, and this is assuming we even know about any backups.
> Without any backups floating around out there this stuff is gone forever
I think part of the problem is that most backup efforts result in lawsuits. Related, I wonder how much the data stores, like those that OpenAI used, have preserved, and I wonder how much they will purge from their servers, to be lost forever, as the lawsuits increase.
For Youtube, it also has compression rot. It's not a storage solution, as I've learned. All the videos I uploaded have slowly reduced in resolution, bitrate, and quality, over the years. Those that are more than a decade old have become a blurry mess that I can barely see, at a fraction of the resolution. I can't blame them. They don't really have views, so they're a money sink.
The article starts with an asymmetric comparison of the survival of a single set of documents from 2,000 years ago and then lamenting the loss of some documents today. We don’t have all the songs 16 year olds wrote 2,000 years ago. But that is largely what was lost in the MySpace’s data loss. The truth is, we’ve lost the vast majority of what was created in every era and there is no reason to believe we are losing more today than 2,000 years ago or even 100 years ago.
But we are losing more these days than 2000 years ago. And we save lots more as well. Think of how many billions people can write or paint now and can make photos. How many could write or paint then?
We should think of digital information more as graffiti. It won’t be around forever, I’ve seen most of the old web that I experienced as a child simply disappear, nothing but a memory now. Enjoy information while you have it, eventually it will be lost, like tears in the rain.
This reminds me about the "Life After People" television series [1]. In several episodes it presents possible or imagined scenarios in case of a sudden human removal, in various areas: cities, animal life, oceans, etc. Every single bit of modern life requires maintenance. They could have added an episode about the digital information :)
My story is simple: I supported important historical site, but due to war and personal issues I was not able to transfer money to .ru zone registrar. I have the data, but domain was lost.
Generational churn means eventually no one who knows why the nested dolls were nested as they are will be gone. Humans will create a new set of nested dolls they can grok.
Paraphrasing Thomas Jefferson; clearly the dead do not rule the living.
Edit; forgot this point… Maybe he said that; it’s unverifiable for us. Maintenance of hallucination is all it ends up.
Reality we see, smell, hear, and touch is what we get. There’s no violating physics. Let it be lost. It’s going to happen anyway.
When I was a kid, The Brave Little Toaster was easy to find and watch. Now it seems to be in some kind of void, where no streaming service has the rights and few places have physical copies. In my opinion, it’s an amazing piece of work that deserves better. And it is an early work from some very famous animators. If that can be lost to time in just a few decades, there must be a lot of work that is disappearing.
One area where I expect this digital rot to have significant effects is with obituaries. It will likely frustrate the next few generations of genealogists hunting for records of early 21st century ancestors.
Obituaries that appeared in print newspapers during the 20th century were easily disseminated and decentrally archived (typically by loved ones and libraries), making them relatively rot-tolerant.
Distribution isn’t a problem for digital obituaries, and in many ways the web is better than print in this respect.
But when it comes to preservation, there are many factors that make digital obits in their current state particularly susceptible to rot. They tend to be centrally archived and often behind paywalls, making them susceptible to digital rot and difficult for organizations acting in the public interest to archive.
The for-profit company Legacy.com controls a strikingly large share of the market for digital obituaries. It partners with funeral homes and newspapers, and in many cases when a visitor browses obituaries on the website of a local newspaper or funeral home, they’re actually redirected to Legacy.com, which hosts the content.
Unfortunately, the newspapers and funeral homes themselves often don’t maintain their own copies of the obituaries. What happens if Legacy.com or one of the smaller memorial sites goes out of business or experiences some sort of data loss? Because of the centralized nature of how these digital obituaries are stored, it’s possible that very few other organizations will have archived copies of the content.
What's preserved is also getting more and more sterile. Material that's now unfashionably rude or just plain unprofitable has a tendency to just be erased. I invite you to look through your Liked YouTube videos and witness how much of what you enjoyed has been taken from you forever.
Consider that everything you witness has an expiration date and if you don't save it maybe nobody else will.
Imgur recently purged an untold number of pictures from their servers. How much knowledge has simply been discarded because it was too expensive to keep it?
Just ran into this. Many companies I request my data from have retention periods and would rather be rid of it. Rare are a couple that do seem to keep _everything_ though
I think the Library of Congress should be given a well funded mandate to create and maintain a digital archive of the entire publicly accessible internet.
The government is probably already doing with the NSA but we normal people can’t access it.
This is the first time I've heard the phrase "Digital Dark Ages", and without any further context I'd say it's a great name for what's happening right now. I feel like every year we're taking more steps backward with the ever-increasing ratio of user-hostile platforms wresting control and dignity away from the end-user. In some 100-200 years I think humanity will course-correct and they'll look back at this time as the "lost age".
Most writing on paper is short lived, and it is generally inaccessible while it is retained. For example, a lot of writings authored by lawyers such as contracts, mortgage documents, etc. have specific retention requirements, and they are destroyed upon expiration.
Even in categories such as fiction, there is no reason to keep every novel written. Only a tiny percentage have enduring literary value.
For society, as for the individual, forgetting is an important process in maintaining sanity.
I've played in 2 Diplomacy tournaments. won my first. lost my second. for the 1st, I had been playing regularly prior and had prepared well in advance. for the 2nd, I had zero prep, hadng played in 10+ years, and was a walk-on replacement for another player who was a no-show.
I can rightly say I am a Diplomacy tournament champion.
here's my problem. The 1st, that I won, was during The Digital Dark Age. around 1990
The 2nd, the no-prep walk-on that I lost, was like 15 years later. When everything was getting put on the web somehow, Google-able and archived etc
my Diplomacy record therefore is Heisenbergian. I am a champion... except if you Google to confirm it. in which case you'll see that I lost
both things are true. paradox FUN caused by the Digital Dark Age
In 2008 I've created a SaaS that shutdown in 2015 (had great LTV and super low churn, just had none doing the growth side and at the time I was immature to raise a seed). Turns out, today I don't even have the code in a proper format and it would be valuable to have it as reference as lots of good ideas where used into the internal architecture. If I'm lucky, it's stored in some old DVDs that I've in a drawer.
Most paper, pen and pencil will outlive most digital media.
Most printed photos will outlive digital photos.
Most CD, DVD, VHS will outlive cloud stored videos.
The dead sea scrolls lasted hundreds of years and yet 99.9% of other thoughts, ideas and songs have been lost. Maybe it's okay that things are forgotten to be rediscovered in new ways. Maybe death and rebirth is what gives the world of knowledge and stories its vitality and to grasp it forever would be to use the gravestone as your storage medium.
Oh I know. There is an easy solution to this. Just sell the information for Bitcoin and use Bitcoin as a store of value and then in the future buy the information back using the store of value feature of Bitcoin.
(Cough, someone still has to store the information.)
And yet I suspect that digital information involving identification and tracking for the purpose of serving ads won't be lost, because there's money to be made and control to be exerted.
Oh please. You'd have to be naive to think otherwise.
Why worry, in 100 years AI will create all the music one likes in a second, and if one wants sweet memories of the past, AI will create those too) And inject something to extract sweet nostalgic tears.
What's the real story with the MySpace data? In all these years I don't think I've ever heard what really happened from anyone actually involved in the loss.
i don’t think people realise just how much of data degradation in the age of everything being documented online is a good thing. nobody wants to see your cringe from 20 years ago. the internet is a conduit for culture and much of it has to be erased and rebuilt with every generation, such is its nature
Similar to their Rosetta Stone disk, if we store enough data to decide just the first content. Like TOC, some intros, etc, and some instructions on how to build a rudimentary device to read/decide the next, more compressed, digital bits of info, so that after accessing this second level data, you could gradually build better devices to decode more if the library.
So some data can remain unretrievable without going through the first steps, but it would still be accessible.
Id like to also highlight the need to preserve personal and business data stored in a variety of SaaS services. Google export is a great example of a large provider doing the right thing. I wish most smaller SaaS companies were similar.
Specifically, I've been recently looking for accounting software for Polish VAT accounting.
I'd much prefer standalone software rather than SaaS, but I found none. Only SaaS offerings. Of a dozen of them only one has a public API one can use to export data(if one writes exporting software). None have ability to export all data in a simple format in one go.
Why does it matter? Accounting records, buy/sell invoices, expense documents etc gathered over years and decades can be very valuable.
Personally, despite being against over regulation in general, I think there should be some law that required SaaS companies to provide reasonable data export capability. GDPR already gives people an ability to request all data another entity has on them, but making it an essential part of the service that could be used regularly (as a backup of sorts) would be much better.
Photobucket recently asked for me to pay for service because they don't do free anymore. I was hosting an old, old forum signature and little else, it was a nice bit of nostalgia.
I made sure the backed-up copy on my machine was ok. Then I purged my bucket and am proceeding to delete my long-stagnant account.
Can't fault them for trying to make money but it sure seems capitalism is a big fan of entropy. And it's good to remember that you see your data as an asset, while it's aways a liability for a company providing services; free, paid, or otherwise.
If I didn't happen to have the same email, I wouldn't have received the account deletion email to even check what was in there (and reset my password). A whole lot of early social internet content is gonna get poofed quite soon.
Forget about digital dark age. How about tech dark age? Our tech supply chains are incredibly fragile. We saw what happened in Covid and that was just a virus.
With AI we have a real chance of 'losing the past' as in we won't be able to tell fact from fiction. The bar to modify images, video, and text form the past will be so low that everyone will do it. And on top of that, coming autonomous agents will do be creating and modifying information at rates we just won't be able to keep up with.
I'd say we should sign everything we can now, anything created from 2023 onward is already suspect of being created by AI. The past will be 'erased' as well if there's no way to verify our historical information.
For example, I create an AI photo of Frank Sinatra in an LA diner eating a sandwich and post it online - tell me how on can verify today that picture is legit or not. Whose the arbiter of all Frank Sinatra photos? How much time, effort, and money would it take to do that verification? Now extrapolate this example to everything. The past becomes only myth and legend.
Its kind of a fun thought, but the past will become more like the future, we can guess what probably happened, we'll never really know for sure. Just like I probably know what will happen tomorrow, but I'll never be sure.
AI is going to give us the power to extrapolate information from the past like never before. With a few lines of text I may be able to create a feature film on Abraham Lincoln, how much of that will be 'frog DNA' spliced in to keep the story going? Which may be used as source data for something else, and so on. The past is already a game of telephone into the future. With the ability to fill in the blanks provided by AI, the signal to noise goes way down. Figuring out the actual real source information becomes a lot more valuable, but without locking down that source information today we may lose the ability to verify many pieces of information as sources versus AI creations in the future.
There is a difference. Humanity lies, but until now it could not retroactively mass fabricate truth so convincing that even people current day skeptics had trouble seeing through. When I say current day, I mean in the past, a genocide was being seen as such by at least part of the human population, but than that truth was erased by storytelling. Today, events are fabricated into existence by AI. You go to google to find alternative resources and you can’t trust them either. You can’t verify because it is too costly to verify every single thing you see/hear/read. It’s a problem.
I can imagine a couple million years from now, some alien species shows up, we’re all gone and they think maybe we had wings, some of us were born with blue hair and other were half robots. I get they can study some of our remains, but so much of us is mutable digital info now.
Is good to remind that history is always written by the winner side by necessity.
Also they usually like to burn any -history- or -culture- of the loosing side and adapt their customs to their new ones and call it a day, erasing history pretty much, as much of it as they can at least.
- You usually can attribute history written by a victor to said victor;
- There's only so much control a victor has over what's being kept by the monks, librarians, museum curators and individuals, and what of it will resurface once they're gone.
With AI, we're not talking about alternative history, but rather about infinite, arbitrary alternative histories that can't be told apart from the real one.
>Every time we forget the old world, we reinvent it better
Historically that has not been the case for most of human history. As odd as it might seem, in general the rediscovery of the accomplishments ancient world has been a great driver towards progress. The periods when the accomplishments of the past were lost and fully forgotten were the sorts of times people call Dark Ages.
Are we currently in a Digital Dark Age because the pace of content creation has far outstripped our ability to preserve that content for posterity?
Obviously we can’t be sure how the current era will be viewed from the far future, but your comment made me realize that the current situation has similarities to that dark age.
Responding to a statement about "for most of human history" with a reference to a very recent event isn't something you should begin with "actually" since it's not a reply to what I was saying, it's just your own tangent. Also, actually, even in the early modern period people were still looking to the past for inspiration even when they were reusing land - there's far more to the past than mere buildings.
You mean the cities that are most expensive and worst to live in, because they didn't have relics of the past putting brakes on the greed of real estate owners and developers?
Let me simplify then: modern cities suck. They're soul-crushing, sad places to live in. A lot of that has to do with modern construction and economic philosophy which you so praise.
Or we think we do, giving ourselves a much needed boost to ego and letting the cycle continue and keep everyone happy. Always assuming that we have made things "simple" because we know our way and not someone else's etc. That being said there's a lot of progress in the end but we take a lot of steps backwards to get there.
ETA: The issue I face is two-fold. More pressingly, there are files just missing. Nothing I can do about that barring someone magically having them downloaded to an old PC. But also interesting is working with old formats. The 2000s/early 2010s Halloween Horror Nights websites were all written in Flash. They have tons of little Easter eggs and information for the event obsessives like me. Between fan backups and the Internet archive, the files are pretty complete thankfully. But since Flash has been dead for a while, I have to rely on the Ruffles Flash emulator to get it running on the web. But that doesn't work super well. On my list of things to do is to contribute to the project to try to get some of the files working.
In case anyone is curious about my backup of the event or if you're also a fan and have any of the files listed to share and archive, my site is hhncrypt.com!