Also a lot of surviving websites are no longer so accessible. ie Youtube and reddit comments, panoramio pictures (which google kept but you can't see, even if you took them), even capturing a large amount of tweets is pretty hard right now.
Companies are closing down all data they can grab. Youtube used to have a list of the top 1000 videos ordered by view count and other parameters. It allowed you to explore and filter their data. To choose what to see. To index their content. This is no longer possible. They want to keep everything they know about you private, and they want to control what you see and "discover".
This is pretty far away from the open web we had 10 years ago. Part of the blame is Javascript, but most of the blame is on users who don't care. Smartphones ruined the internet in my opinion. Sure, there's more content than ever, but it's not freely available. In fact you're not even allowed to know what's available.
I've tried to read this week an (unfinished) fantasy novel on a forum, which has been there for a few years. I had already read a few chapters and it was very well-written. Except the author decided last month to remove all his content from the place (10000+ forum posts), which included this unfinished novel and two others. I tried to find back-ups (on the internet archive), but only the first page of each thread was saved. I only found a few chapters using google cache. And I don't think the mods will agree to restore the content. So yeah, I can relate with the idea that internet isn't forever.
I got burned by this yesterday. I was looking for an old photo that was only on myspace, but apparently they had a huge content purge and it was long past the migration phase :(
Very true, the internet has a memory problem. If you try to research something that happened, let's say 10 years ago, by using just the Internet you'll have lots of trouble. If it wasn't printed in one of the major newspapers you're out of luck. Not only that but you'll need to go through a truck load of data if you want to expand on what the newspapers have to say. I suspect that eventually, no one will be able to look back at the net's past since cryptography is becoming a must for everything net. If you're reading this and you encrypt your data, think about the data 10 years ago that you encrypted. Can you get to it? Maybe. Think about someone else's. I bet you can't. What happened 10 years is blurry now. If that's the case now then what will happen 100 years from now.
SRI-NIC wiped all the WHOIS NIC handles when they handed the job on to somebody else. Network Solutions I think. So, my 1985 NIC handle got wiped by the dotcom boom sometime between 1992 and 1998.
Archive.org is a great project that one can't recommend highly enough.
It's not "just" collecting huge amounts of data, it is a living archive. For example, when they publish old Amiga software, they publish it ready to use. And I don't just mean polished disk image files. I mean ready to use right in the browser. Click on a ancient game, and play it immediately in your browser!
That's exactly the kind of attitude towards archiving that we need. Like one of the more modern museums where you are allowed to touch (and interact with!) things.
I don't think IPFS and the like can solve the archival problem. They rely on the willingness of lots of people to donate lots of storage space indefinitely.
I don’t get why people think IPFS is only a volunteer thing. The same way bitcoin quickly evolved from people mining on laptops to professionally run datacenters, so to will providing resources for IPFS.
Filecoin will allow people to monetize making content on IPFS available, with the ability to prove the content meets certain availability guarantees via smart contracts.
Making content available via IPFS will become the new way to mine cryptocurrency.
But there are other protocols incoming too, like Swarm for Ethereum, Storj, etc. Some of them already available just like IPFS.
Not an apples-to-apples comparison—IPFS is the only one that’s an offline first, peer-to-peer, distributed versioned file system. Swarm is interesting but it’s for small storage for smart contracts; it’s not a general purpose, low-cost storage option for storing terabytes of data. You’re not going to take a snapshot of Wikipedia on it, for example: https://ipfs.io/blog/24-uncensorable-wikipedia/
I doubt IPFS will automatically win this because it was the first to make a glorified BitTorrent client available via HTTP.
This statement doesn’t make sense—IPFS is designed to replace HTTP, not run on top of it. While it shares some similarities to BitTorrent like using a DHT for content addressing, it’s really a different thing.
>Swarm is interesting but it’s for small storage for smart contracts; it’s not a general purpose, low-cost storage option for storing terabytes of data.
Swarm is intended to do exactly that.
IPFS, atm, does not store terabytes of data. If I were to just dump in my data, it would be unusable within the hour as there is no incentive for nodes to keep those terabytes active and around.
Swarm already hosts static websites, snapshots of wikipedia are feasible.
>IPFS is designed to replace HTTP, not run on top of it. While it shares some similarities to BitTorrent like using a DHT for content addressing, it’s really a different thing.
IPFS does not address dynamic content properly, IPNS is way to slow to allow websites on the scale of google to operate sensible. I doubt IPNS could handle a decently sized subreddit in terms of activity.
IPFS is unlikely to replace HTTP since both protocols address different problems. However, as it is usable today, IPFS is little more than a cache that can store some data for a bit until nobody is interested in it.
Last I recall, Filecoin is not IPFS, it merely works on top of IPFS. You may revie your argument and replace every occurence of "IPFS" with "Filecoin", in which case it would still compete with Swarm, Storj, etc.
I don't think the internet was ever supposed to be forever. Sometimes it looks like it's forever because people copy content from one public place to the other but this behavior is not inevitable.
Companies are closing down all data they can grab. Youtube used to have a list of the top 1000 videos ordered by view count and other parameters. It allowed you to explore and filter their data. To choose what to see. To index their content. This is no longer possible. They want to keep everything they know about you private, and they want to control what you see and "discover".
This is pretty far away from the open web we had 10 years ago. Part of the blame is Javascript, but most of the blame is on users who don't care. Smartphones ruined the internet in my opinion. Sure, there's more content than ever, but it's not freely available. In fact you're not even allowed to know what's available.