So, the same question and point of criticism that I've brought up in past discussions about IPFS, and that so far has not yet been sufficiently answered by anybody:
The claim is that IPFS could replace HTTP, the web, and so on. The only thing I see, however, is a distributed filesystem, which is only one part of the puzzle. Real-world applications require backend systems with access control, mutable data, certain information being kept secret, and so on - something that seems fundamentally at odds with the design of IPFS.
How would IPFS cover all of these cases? As it stands, it essentially looks to me like a Tahoe-LAFS variant that's more tailored towards large networks - but for it to "replace HTTP", it will have to not only cover every existing HTTP usecase, but also do so without introducing significant complexity for developers.
Seriously, I'd like to see an answer to this, regardless of whether it's a practical solution or a remark along the lines of "this isn't possible". I'm getting fairly tired of the hype around IPFS, with seemingly none of its users understanding the limitations or how it fits (or doesn't fit) into existing workflows. I can't really take it seriously until I see technical arguments.
IPNS is used as the mutable state driving 'dynamic' content. The dynamic content is still published as a normal immutable reference, but is referenced by an IPNS name/path/hash. To me this appears to be like a CNAME in dns. IPNS provides the pointer to the immutible hash, and for dynamic content you would only ever use the IPNS.
For authorization, this could be done via PKI. Take some data, encrypt it with one or more user's public keys, then post it to IPFS. Then the recipient, and only the recipient, would be able to access the content.
I can't really take it seriously until I see technical arguments.
Well, to be fair, the technical arguments appear to exist (after all there is a IPFS draft document which covers the stuff I mention above), and while something like replacing HTTP is obviously extreme, those arguments appear (on their face) to not be completely crazy. So maybe if there are specific technical issues that aren't covered by the draft documents and other documentation on the IPFS site, github faq page, etc, then maybe bring them up and people can help you answer those questions.
Let's separate the use cases: we want a permanent, public web, and we want applications that don't die when servers get shut down. IPFS clearly solves the first use case, so let's talk about the second.
In this context, IPFS is part of the toolkit for building decentralized applications. The architecture of these applications is fundamentally different from today's apps. Users control the data, not the application developer. Private data can live on the user's machine or some cloud service they trust to hold the data for them. IPFS probably isn't a good fit for private data, but public apps like a decentralized Twitter will probably use it for all of their data. Encrypting private data and storing it with IPFS will sometimes make sense.
To deal with mutable data, it depends what your goals are. If users don't need to agree with each other about the value of the data, they can store it however they want and maintain their own view of the world. If they need to agree on data that only one person can change, then you can use IPNS to manage mutable references controlled by one keypair. If users need to agree on data about their interactions with each other, then they need to use a decentralized cloud computer that can enforce rules on a historical record. We call these enforcement clouds "blockchains," and Ethereum is a blockchain that defines a process for defining new rules so application developers don't have to build their own blockchains.
Building decentralized apps with IPFS and Ethereum is the most exciting work I've ever done. It doesn't just improve the apps themselves, it fundamentally changes the economics of our industry. Most businesses are built on network effects, but decentralized apps don't bottle up the value of their networks: they give them to their users.
That's more like the world I want to live in. If you want to be a part of this change, you should join us at ConsenSys, or join the decentralization movement in general.
I've recently been considering building an application that runs on a backend of Ethereum+Storj. The ability to store (and retrieve) data in Storj chain feels like a necessary component in getting a full DApp set up, right now I'm grappling with a concept that would (somehow) require the ability to provide users with a single-use key to download data from Storj.
The future of cryptocurrencies is bright, there's now a large set of tools in the toolbox to use to build resilient, permanent applications for people to use. I'm really excited to see what people build in 2016, the cryptocurrency ecosystem has been growing rapidly in the shadows, and they're finally breaking into the mainstream.
Also, the guys working on Ethereum's Swarm think they've figured out a good incentive system for it, and plan to publish an "orange paper" in the near future.
> and we want applications that don't die when servers get shut down.
Easy. Do not make sites and claim they are apps, when clearly they are acting as graphical terminals to a remote service.
An app should be something that runs locally, stores the result locally, and optionally stores things remotely as a backup measure.
The whole "lets do all the heavy lifting out there" approach is a throwback to the leased terminal era. And as best i recall, the personal computer became popular because accountants would rather run their spreadsheet (even if slower and with severe data limits) locally than have to grovel to the admins for mainframe time.
You draw a distinction between public and private, specifically:
> IPFS probably isn't a good fit for private data, but public apps like a decentralized Twitter will probably use it for all of their data. Encrypting private data and storing it with IPFS will sometimes make sense.
This is actually a great example of where I can't see the IPFS model working, because not everything is black-and-white private-or-not.
For Twitter, for example, you have private accounts - this doesn't mean that nobody can see it, it just means that you need to have permission to see it. How do you implement an access control system that:
- Is mutable (ie. allowing new users to see your feed)
- Is private (ie. there is no public list of who can see your feed)
- Is revokable (ie. you can remove people's access to your feed)
... without having to re-encrypt everything using a secret set of keypairs every time the ACL changes, which is an unreasonable burden on resources?
I understand that you build applications on a different kind of architecture, and I'm all for usable decentralized primitives, but there have to be explicit solutions for these kind of 'business requirements' (whether commercial or not), for it to ever be able to "take over".
As someone who does devops for a living this feels like it could be a replacement for s3 inside an aws vpc, without connecting to anything outside of your private network. You could just have nodes register with consul when they come up.
Searching public data is already decentralized: anyone can run a web crawler and search engine. Search engines, recommendation systems, and AI assistants that use private data are harder to decentralize, but they're also where all the gains come from: I'm stuck with Google unless I want to move all my data somewhere else, and that's hard.
To decentralize private search, we'll build protocols for publishing your private data to cloud services that represent your interests. For some purposes, you'll publish data to services that keep your data isolated. You might even publish that data to your own machines. This works for email and cloud storage like Dropbox. Other purposes require data to be combined to add value, like recommendation systems and AI doctors. Services will compete with each other for your trust so you'll pick them. The bigger they are, the more useful they will be, so they're vulnerable to collapsing back into a monopoly. Anti-monopolists will give their business to competitors to keep them alive.
> Searching public data is already decentralized: anyone can run a web crawler and search engine.
That is nice in theory, but not how it works in practice. Searching public data generally involves a significant amount of investment, both financially and effort-wise, to have anything approaching a usable search engine, even for relatively small sets of data.
Counter example: Git is immutable object store successfully replacing mutable CVS databases. The trick is to build mutable functionality on top immutable store. In case of git it is branches/tags that are light-weight mutable pointers to immutable objects. Additionally I believe access control and privacy can be built on top of immutable store using cryptography.
You've nailed it. At Peergos [1], we are doing exactly that, with privacy as our primary focus. We layer encryption on top of IPNS + IPFS to get a distributed private, access controlled filesystem + social network that hides your friendship graph. IPFS is a wonderful model to work with.
"Using cryptography" is (like some of the other things mentioned) nice in theory, but has significant drawbacks as well.
If you've ever worked with stateless servers using something like JWT, you will have run into the issue of 'stale data', essentially the cache invalidation problem. Trying to solve ACLs with cryptography is prone to the exact same issues, as well as overhead problems. The crypto itself is immutable.
EDIT: In fact, Git is a great example of how you can't really do access control very easily in such a setup. Have you ever tried restricting access to a specific branch in a repository?
I don't think IPFS is trying to replace HTTP; it's trying to replace URLs. IPFS's goal is to offer universal content-centric resource location as an alternative to the Web's host-centric resource location. This is a nice thing because it decouples content from hosting--you can get the content from any reachable peer who has it. A good chunk of Team IPFS's engineering efforts have been put towards offering a wide variety of protocols to efficiently resolve a content hash to the content in a wide variety of contexts, covering things like peer discovery, content hash announcements, hash naming (IPNS), super-peers (ipfs-cluster), pub/sub, and so on.
IPFS is definitely not a filesystem in the classical (POSIX-y) sense. I think the naming is unfortunate, because it implies that IPFS has things a global root directory, permission bits, users, ACLs, immutable human-meaningful names that resolve to mutable content, a POSIX-y consistency model, etc. IPFS does none of these things--it defers them to layers above it. All IPFS is concerned with is providing a Merkle DAG abstraction, and the necessary network protocols to walk it and fetch the underlying content.
Yes, and that's the problem - because it cannot address many of the features required for dynamic content. I'm sure it works fine for static content, but that's the easy problem to solve (and it has been solved before).
You're right, IPFS does not try to address dynamic content. I don't speak for the authors, but I think their opinion is that doing so is out of scope. I think they expect the application to represent each piece of mutable content as a Merkle DAG, and have the application acquire each user's written version of the content from IPFS, merge them, and announce the "latest" version under a new hash.
Real-world applications require backend systems with access control, mutable data, certain information being kept secret, and so on - something that seems fundamentally at odds with the design of IPFS.
I don't think that's at odds with the design.
If you want to keep something secret, you encrypt it. If you want to manage access control, you sign it. A client application can discard information that cannot be decrypted, or isn't signed by an authorised key.
You don't get atomicity guarantees of course, but you can often get away without them. I don't see a system like IPFS as a replacement for all of the web, just a sizeable percentage of it.
The technical answers exist. Please look for them! I'd love to write personalized answers for every person that asks, but i better spend my time developing. Also you should look at the rest of the discussions too, not just my compressed answers. be part of the discussion, raise your concerns, and have them answered.
For sake of saving you time, some short / compressed answers:
- mutable data: look into ipns (in the ipfs paper or repos linked above) and look into CRDTs. CRDTs can layer cleanly on top of ipfs for distributed mutable state. if you haven't seen CRDTs: http://hal.upmc.fr/inria-00555588/document
- access controls / certain info remaining secret: data encryption and capabilities. you mention Tahoe-LAFS, yep!, we can (and will) build a cap system similar to it on top of raw ipfs objects (ideally actually collaborating with the excellent Tahoe-LAFS team directly). --- oh and many users want easy selective disclosure for encrypted multiparty records. we've not built this in yet because we have other priorities right now. if you'd like to contribute to this we'd love the help!
- "it will have to not only cover every existing HTTP usecase" -- we don't aim to cover _every_ HTTP facility, instead we can power a nicer model for distributed data, where you replicate datastructures directly. This looks a lot closer to the data models people use in apps today (think single page app models, {backbone, react, meteor, ...}. these are layered on top of APIs like REST, but can trivially layer over IPFS too)
- pub/sub: you didnt mention it, but it is necessary for fast (subsecond) updates to mutable data in the large (>millions of nodes). we're working on these designs now, and impls coming in a few weeks/months. (some simple non-scalable versions already proposed and implemented to get started, but the long term solutions coming later).
I ask you please take a deeper look at the comm channels of our community, and that you voice your concerns + questions in our FAQs and other relevant repos. That way we can improve our models and systems based on your input. And if you have time to help us implement some ideas, join us! :)
I will have a look into these things later when I get around to it, but I just want to address a few things upfront:
> you mention Tahoe-LAFS, yep!, we can (and will) build a cap system similar to it on top of raw ipfs objects (ideally actually collaborating with the excellent Tahoe-LAFS team directly).
The cap system that Tahoe-LAFS uses is great, but only serves a limited subset of usecases. It is not possible to revoke access, for example.
Depending on implementation it may be possible to accomplish this (to a degree) by having a separate mutable pointer for each person who has access to the data - and simply not updating their pointer to newer versions once their access has been revoked - but this still doesn't cover the "oops, accidentally gave them access, let's hope they didn't notice and revert it" usecase.
(EDIT: It's not even possible to delete files on demand in Tahoe-LAFS, at the moment.)
> This looks a lot closer to the data models people use in apps today (think single page app models, {backbone, react, meteor, ...}. these are layered on top of APIs like REST, but can trivially layer over IPFS too)
That architecture has actually turned out to be flawed in quite a few ways - in many implementations, it has turned out to be considerably harder to build on top of them than on more 'traditional' architectures. SPAs are common primarily due to hype, not because they are the superior technical solution.
They also come with some pretty serious (and in my view, unacceptable) trade-offs, like requiring JavaScript to be able to use it.
The "FS" part of IPFS can be misleading: you can store arbitrary data in IPFS.
The tooling merely provides a means of viewing Merkle DAGs (the base data structure) as a file hierarchy because it's a pretty clean 1:1 mapping that's familiar to many people.
> How would IPFS cover all of these cases? Seriously, I'd like to see an answer to this...
This is illogical, as evidenced by the illogical replies you are getting below. You are asking in a "serious" way for the IPFS community, developers or Juan to explain how all current use cases would work on IPFS.
That's a pretty tall order for an Open Source project that (seriously) owes you nothing.
I would suggest you attempt to run an IPFS node yourself and write a few applications and tell us how it goes.
It's a question I'm asking because that is the claim that is being made. You should address this reply to those making the claim, not to the person questioning it.
EDIT: Also, please get out with the "owes you nothing" argument, I'm really growing sick of it. If I see something generating undeserved hype with potentially bad consequences for 'society at large', then I will damn well call it out. Don't like it, don't reply to it. It has precisely nothing to do with anybody owing me anything.
the name in itself shows this is not a viable concept at all: a global filesystem? the web was all about getting rid of file abstractions and introducing hypertext. the world (and the net) isn't about just data. its about code, security mechanisms, etc. etc. filesystems are not part of the puzzle, they are just old abstractions the computing world hasn't get rid of yet.
IPFS is not replacement of web by a filesystem, it is replacement of the current platform for the web - centralized hosting - by a new platform - decentralized global filesystem. The usual web technologies - hyperlinks, dynamic web pages run by scripts - are on upper layer and can remain in use.
I really like that IPFS is trying to change the way we think about the internet and HTTP. That being said, I'm very skeptical of a lot of the design choices. It seems like it's just trying to incorporate a lot of the latest buzzword technologies without any real consideration why. I get that blockchain, Git, BitTorrent are all powerful but that doesn't mean that mixing them all together into IPFS is going to be useful. Most likely it will end in a sort of internet Frankenstein's monster: overly complicated and lacking real benefits over traditional HTTP, FTP, and the rest.
My biggest concern is that in the end IPFS isn't even really "permanent" in the way I understand it. Objects added to IPFS still need someone to in a sense "seed" them for that content to be available. What advantages does that give over just hosting the internet over static torrents?
> It seems like it's just trying to incorporate a lot of the latest buzzword technologies without any real consideration why. I get that blockchain, Git, BitTorrent are all powerful but that doesn't mean that mixing them all together into IPFS is going to be useful.
While the IPFS whitepaper and spec does utilize a lot of modern technology terms, it does not mean they were assembled without serious consideration and purpose. What gives you that impression?
All of those pieces have important properties which create the essence of what IPFS is. When you say Blockchain, Git, BitTorrent, I hear: Directed acyclic graphs (like git) with hashed hierarchical checkpoints (like a blockchain or merkle trees) distributed peer to peer (like bit torrent). This is literally what IPFS is, and using those terms is one way to describe it.
Take a look at this: "Content Model, or Replication on IPFS" https://github.com/ipfs/faq/issues/47 -- it's a discussion on WHY we had to separate out the replication part. Check out also:
- ipfs-cluster - discussion on a tool we'll build to replicate archives to many nodes and have a RAID-like cluster configuration - https://github.com/ipfs/notes/issues/58
>But if such a cost can save you hardware and server hosting, it's worth it.
The vast majority of this discussion is miles above my head, but this made my Spidey Sense tingle. Someone somewhere has to host the data, and seemingly in more than one place. There is no such thing as a free lunch.
One of the things I am most looking forward to is the abstraction into libp2p[1]. I am wanting to try out my own ideas but I don't want to hassle w/ building my own Kademlia DHT or NAT traversal.
>It is left as an exercise to the reader to think about why it’s impossible to have cycles in this graph.
This was funny. Suppose you wanted to build a node that linked to itself. You'd have to find a fixed point in the combination of functions that adds other data to the link and hashes it. Finding a fixed point of a hashing function is hard.
In fact, for an ideal cryptographic 256 bit hash function, modeled as a random oracle, it takes an average of 2^128 iterations before reaching a periodic point. The average cycle size of the reached periodic point is also 2^128. There exists a fixed point for your set of files only if the cycle size of the periodic point is 1.
Using the big-step, little-step cycle detection algorithm to avoid using gigantic amounts of memory, you're then looking at an average of 1.5 * 2^129 iterations of updating your graph of 256-bit cryptographic hashes in order to discover you've hit a periodic point.
Offhand, I don't know the probability that there's a fixed point for a given starting point for a random mapping of 256-bit values to 256-bit values, but my intuition is that it's vanishingly small. If anyone has an elegant derivation of the probability, I'd love to see it.
The expected number of fixed points in a random permutation is 1. This is an application of linearity of expectation: for a random permutation f of size N and a given input X, the probability that f(X) = X is 1/N, i.e. the probability that X is a fix point is 1/N. There are N possible choices of X, and so by linearity of expectation the expected number of fix points is N * 1/N = 1.
This doesn't tell you anything about concentration bounds or whatever, but it's a neat fact nonetheless.
Unfortunately, it's a random mapping, not necessarily a random permutation. An ideal block cipher would be modeled as a random permutation. Though, in this particular case the domain and range are the same, so unless I'm missing something, the expected number of fixed points comes out to 1 by the same reasoning.
Indeed, the expected number of fix points is the same for both permutations and mappings. Quite curious when you think about it, because the distribution of fix points is obviously different! (Consider the case n = 2 for the simplest example: there are mappings with exactly 1 fix point, but every permutation has either 0 or 2 fix points).
Note though that a correct block cipher is necessarily a permutation, because it's invertible (by definition, a permutation is just an invertible mapping with domain and range equal). A hash function on the other hand needn't be a permutation even when you restrict the domain to inputs of the same bit length as the hash output.
Actually, I should point out that you can start your iteration initializing the hash links to random values, which means there's not just one cycle you could potentially arrive at for any graph of files. I didn't mean to imply that discovering a cycle of size > 1 for a given file graph proves that there don't exist fixed points for that file graph.
I'm pretty sure it's "just" extraordinarily difficult. With some things, the difference between the two is vanishingly small. Like, in principle, flipping a bit takes a certain amount of energy. So you can take how many bit flips the best algorithm to do something takes, convert it to energy, and compare it to "total energy output of the sun for the next one hundred years", then call it pretty much impossible if the algorithm's number is bigger.
What I didn't see answered in the article was how content is discovered.
The only way we are able to productively use git is because there is a convention to have some state in a non content-addressable location (.git/refs, .git/HEAD, etc...).
Saying that IPFS could replace the web means either: 1) Introducing shared mutable state; or 2) full knowledge of everything on the network.
I'm guessing that the existing web is what provides that layer right now. Is there any work going on for novel IPFS-based content discovery mechanisms?
Another thought: Given the content-addressable, immutable nature of this graph, how does one discover that a new version of something is available without a central authority? How could we discover the tip of a blockchain with IPFS alone?
There are a lot of possible ways to approach this, such as overlay networks where peers interested in a topic can learn of and publish updates quickly to others (e.g. pubsub).
IPFS needs not be a kitchen sink: it's very easy to layer entirely application-layer overlay networks that do other useful things.
Just pointing out that GNUnet and Freenet both allow pretty much similar feature set. I've studied both extensively, and after checking out IPFS, I don't get what's new. Except all the 'hype' around it, which is generally something which I as tech nerd dislike. Another problem with distributed solutions is often performance, some tasks just become surprisingly expensive.
The problem with this and other initiatives is they don't really have a good story for what happens with a lot of data-lint regular people have. They make broad assumptions that we have near limitless storage resources (including those needed for redundancy) when private users definitely don't have that and even at enterprise levels, the story is still fairly complicated.
Immutable is an interesting idea - it's a lot less interesting when 100 different copies of the same slightly changed RAW file from my digital camera are using up 100s of gigabytes. Or I misclick and something goes into the public store which shouldn't - I might not be able to get rid of all of it, but I should be able to undo it a little.
If you look a little bit closer into camlistore, you'll see that it splits content with content-defined chunking, ie when there's a change in the middle of the file only this zone (camlistore targets 8kB) is really added to camlistore; there's an indirection between the file as an object and the actual content. That means that hundreds (or even thousands) of slight changes to the same big file won't change much.
Regarding misclicks, for camlistore everything is private by default; in order to make something public you have to build an authorization, which you give to someone. You also have the possibility to remove that authorization, and the data won't be publicly accessible anymore.
That doesn't address the issue: any compressed content (and raw files are losslessly compressed) tends to really break rolling hash type splitting systems.
- in this talk i discuss a bunch of datastructure stuff, including using IPFS for PKI, for arbitrary dns-like records, for name systems, for CRDTs, and so on.
The graph to describe the directory is a misprint, right?
"testing 123\n" isn't anywhere, and "Hello World" (and its hash) is pictured twice. I'm sure that the testing.txt arrow should just be pointing to a node with a different hash and content.
The claim is that IPFS could replace HTTP, the web, and so on. The only thing I see, however, is a distributed filesystem, which is only one part of the puzzle. Real-world applications require backend systems with access control, mutable data, certain information being kept secret, and so on - something that seems fundamentally at odds with the design of IPFS.
How would IPFS cover all of these cases? As it stands, it essentially looks to me like a Tahoe-LAFS variant that's more tailored towards large networks - but for it to "replace HTTP", it will have to not only cover every existing HTTP usecase, but also do so without introducing significant complexity for developers.
Seriously, I'd like to see an answer to this, regardless of whether it's a practical solution or a remark along the lines of "this isn't possible". I'm getting fairly tired of the hype around IPFS, with seemingly none of its users understanding the limitations or how it fits (or doesn't fit) into existing workflows. I can't really take it seriously until I see technical arguments.