Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmm, I've been playing with IPFS lately, and just had an idea: Since IPFS is perfect for archival, Archiveteam could put their files on IPFS, and users could help out by pinning stuff on their local nodes. For example, I could ask their website to give me a 10 GB list of files to pin (if I wanted to "donate" 10 GB to them), and I'd keep them available.

The only problem is that I don't know whether IPFS has any way to gauge availability, so I'm not sure if the team could tell which files were only hosted by a few people.



Archiveteam has discussed about backing up the Internet Archive a long time ago in their project called INTERNETARCHIVE.BAK (http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK). They decided to go with git-annex because its creator was a cofounder of the ArchiveTeam and was willing to work with that project, who is still waiting for IPFS's proposal on how to do this (http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/i...).

You can participate in the effort of course. Have a few hundreds of GB and a good connection ? Head over to http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/g... and follow the steps !


ArchiveTeam is putting the files in the Internet Archive archives. While IPFS is great, I'm not sure I agree it's good for archival because it depends on the availability of the IPFS network. The Internet Archive does work to make sure that there are sufficent backups to mean they can recreate their archive.


As long as there is a single party interested in content hosted on IPFS, the "IPFS Network" will persist.

It would absolutely be the best move for IPFS to be used in this case - maybe something like the AkashaApp guys, albeit for audio-media.

Edit: The Akasha App for those who aren't yet familiar with it - https://akasha.world - brings together IPFS and Ethereum to make a truly distributed peer network for persistent content.


> As long as there is a single party interested in content hosted on IPFS, the "IPFS Network" will persist.

I imagine that's one of the reasons why it's not ideal for archival content.


I don't get why ? It's exactly the same today : you need at least one party to host the data (in this case the archive team). With IPFS however, if more parties wish to host it, it will lighten the load.


There is no incentive to keep data in the network.

You can pay pinning services, but what's the point if you're just going to pay someone hosting it?

I'd rather have this archived on Siacoin, Storj, Swarm or any other distributed network with actual incentives to keep things around


IPFS+Ethereum = incentive. Please do not be so flippant to reject something until you've grok'ed it sufficiently well enough to argue against it. If you have looked at IPFS+Ethereum and found it wanting, I'd love to know what exactly - because from my perspective this is precisely the kind of technology that delivers your stated requirements.


I've definitely "grok'ed" it.

Which is why I find Swarm a better solution. It is literally IPFS+Ethereum with additional support for ENS lookups, deniable storage, redundant storage, etc. This allows for far better privacy and being able to compensate the loss of parts of the file, both features lacking in IPFS itself.

The current swarm testnet performs, as per my experience, better than IPFS in terms of bandwidth and latency.

http://swarm-gateways.net/bzz:/theswarm.eth/


I think ipfs is very interesting, and will hopefully evolve to become a viable archive solution - but that'll only happen once you have a few serious organizations running their own redundant ipfs "intranet" storage networks, with geographical redundancy (or at least good local availability). Then we might see "archive peering" between such organizations.

The time horizon of "archival storage" probably starts at a hundred years - that will need some structure to have a likelihood of persisting for so long.


> The time horizon of "archival storage" probably starts at a hundred years - that will need some structure to have a likelihood of persisting for so long.

This reminds me of one of the principles of camlistore's data schema, which explicitly says that they made their schemas overly-explicit so that future digital archeologists can re-create the schema purely from examples[1].

It's a shame that camlistore feels more like a very long experiment over a polished and usable backup/archive system.

[1]: https://camlistore.org/doc/principles ; There used to be a more explicit explanation, but I couldn't find it.


> Since IPFS is perfect for archival

I don't believe this is true. IPFS doesn't have any built in way of easily distributing parts of an archive, doesn't support (as far as I know) any form of erasure coding, making overhead quite high and requires that you use its own weird block + hash scheme for integrity.

It's also very immature, we don't know if IPFS will be around in 10 years and we don't know what kinds of bugs it will have.

IPFS is a great tool and it has its uses but I don't think archiving is one of them yet.


IPFS would not be considered suitable for digital preservation. Have a look at LOCKSS[1].

[1] https://www.lockss.org/


Interested in understanding why you think it's not suitable for digital preservation? Feels like something that uses content-addressing and a P2P network is perfect for digital preservation.


In a P2P network the availability/integrity of the archived material depends on storage nodes not under the direct control of the archivist. I might trust a P2P network for content re-distribution, but I wouldn't trust it for long term storage.


You don't have to; If you would be hosting the archived material yourself anyway, there is no negative I can see in doing it through a P2P network. Long-term storage could still depend on you and any other users helping with any of the content for any amount of time would just be a bonus.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: