Hmm, I've been playing with IPFS lately, and just had an idea: Since IPFS is per...

rakoo · on July 17, 2017

Archiveteam has discussed about backing up the Internet Archive a long time ago in their project called INTERNETARCHIVE.BAK (http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK). They decided to go with git-annex because its creator was a cofounder of the ArchiveTeam and was willing to work with that project, who is still waiting for IPFS's proposal on how to do this (http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/i...).

You can participate in the effort of course. Have a few hundreds of GB and a good connection ? Head over to http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/g... and follow the steps !

cyphar · on July 17, 2017

ArchiveTeam is putting the files in the Internet Archive archives. While IPFS is great, I'm not sure I agree it's good for archival because it depends on the availability of the IPFS network. The Internet Archive does work to make sure that there are sufficent backups to mean they can recreate their archive.

mmjaa · on July 17, 2017

As long as there is a single party interested in content hosted on IPFS, the "IPFS Network" will persist.

It would absolutely be the best move for IPFS to be used in this case - maybe something like the AkashaApp guys, albeit for audio-media.

Edit: The Akasha App for those who aren't yet familiar with it - https://akasha.world - brings together IPFS and Ethereum to make a truly distributed peer network for persistent content.

snsr · on July 17, 2017

> As long as there is a single party interested in content hosted on IPFS, the "IPFS Network" will persist.

I imagine that's one of the reasons why it's not ideal for archival content.

roblabla · on July 17, 2017

I don't get why ? It's exactly the same today : you need at least one party to host the data (in this case the archive team). With IPFS however, if more parties wish to host it, it will lighten the load.

tscs37 · on July 18, 2017

There is no incentive to keep data in the network.

You can pay pinning services, but what's the point if you're just going to pay someone hosting it?

I'd rather have this archived on Siacoin, Storj, Swarm or any other distributed network with actual incentives to keep things around

mmjaa · on July 18, 2017

IPFS+Ethereum = incentive. Please do not be so flippant to reject something until you've grok'ed it sufficiently well enough to argue against it. If you have looked at IPFS+Ethereum and found it wanting, I'd love to know what exactly - because from my perspective this is precisely the kind of technology that delivers your stated requirements.

tscs37 · on July 18, 2017

I've definitely "grok'ed" it.

Which is why I find Swarm a better solution. It is literally IPFS+Ethereum with additional support for ENS lookups, deniable storage, redundant storage, etc. This allows for far better privacy and being able to compensate the loss of parts of the file, both features lacking in IPFS itself.

The current swarm testnet performs, as per my experience, better than IPFS in terms of bandwidth and latency.

http://swarm-gateways.net/bzz:/theswarm.eth/

e12e · on July 17, 2017

I think ipfs is very interesting, and will hopefully evolve to become a viable archive solution - but that'll only happen once you have a few serious organizations running their own redundant ipfs "intranet" storage networks, with geographical redundancy (or at least good local availability). Then we might see "archive peering" between such organizations.

The time horizon of "archival storage" probably starts at a hundred years - that will need some structure to have a likelihood of persisting for so long.

cyphar · on July 17, 2017

> The time horizon of "archival storage" probably starts at a hundred years - that will need some structure to have a likelihood of persisting for so long.

This reminds me of one of the principles of camlistore's data schema, which explicitly says that they made their schemas overly-explicit so that future digital archeologists can re-create the schema purely from examples[1].

It's a shame that camlistore feels more like a very long experiment over a polished and usable backup/archive system.

[1]: https://camlistore.org/doc/principles ; There used to be a more explicit explanation, but I couldn't find it.

_69no · on July 17, 2017

> Since IPFS is perfect for archival

I don't believe this is true. IPFS doesn't have any built in way of easily distributing parts of an archive, doesn't support (as far as I know) any form of erasure coding, making overhead quite high and requires that you use its own weird block + hash scheme for integrity.

It's also very immature, we don't know if IPFS will be around in 10 years and we don't know what kinds of bugs it will have.

IPFS is a great tool and it has its uses but I don't think archiving is one of them yet.

unicornporn · on July 17, 2017

IPFS would not be considered suitable for digital preservation. Have a look at LOCKSS[1].

[1] https://www.lockss.org/

diggan · on July 17, 2017

Interested in understanding why you think it's not suitable for digital preservation? Feels like something that uses content-addressing and a P2P network is perfect for digital preservation.

xenophonf · on July 17, 2017

In a P2P network the availability/integrity of the archived material depends on storage nodes not under the direct control of the archivist. I might trust a P2P network for content re-distribution, but I wouldn't trust it for long term storage.

nske · on July 17, 2017

You don't have to; If you would be hosting the archived material yourself anyway, there is no negative I can see in doing it through a P2P network. Long-term storage could still depend on you and any other users helping with any of the content for any amount of time would just be a bonus.