Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Cloudflare’s IPFS Gateway (cloudflare.com)
621 points by jgrahamc on Sept 17, 2018 | hide | past | favorite | 157 comments



This is my current favourite use case (from the blog https://blog.cloudflare.com/e2e-integrity/)

I wanted to provide an example of the kinds of secure, performant applications that are possible with IPFS, and this made building a search engine seem like a prime candidate. Rather than steal Protocol Labs' idea of 'Wikipedia on IPFS', we decided to take the Kiwix archives of all the different StackExchange websites and build a distributed search engine on top of that. You can play with the finished product here: https://ipfs-sec.stackexchange.cloudflare-ipfs.com


Neat! Looks like this generally works by building a static index and publishing it to IPFS [1], then having some client-side JS do a lookup in the index, and getting a metadata file for each result [2]?

I made something kinda similar a while ago [3], where I gathered a bunch of metadata about files on IPFS at the time by scraping a Searx instance, with a query like "site:ipfs.io/ipfs" or something like that. It was a quick hack job but was fun, and it's cool to see something similar on a bigger scale!

[1] like https://ipfs-sec.stackexchange.cloudflare-ipfs.com/ai/_index...

[2] like https://ipfs-sec.stackexchange.cloudflare-ipfs.com/ai/_index...

[3] https://ipfs.io/ipfs/QmYo5ZWqNW4ib1Ck4zdm6EKteX3zZWw1j4CVfKt..., https://github.com/doesntgolf/ipfs-search, https://github.com/doesntgolf/ipfs-searx-scraper


Yeah! Every question of "how would X work on IPFS?" boils down to data structures.

An example of slightly different approach (a sharded index) is the wikipedia-on-ipfs experiment from last year: https://github.com/magik6k/distributed-wiki-search && https://github.com/ipfs/distributed-wikipedia-mirror


Wow, I never thought there is something like Lunr.js! In my free time, I have been working on https://ipfsearch.xyz, thinking about pivoting into a direction of search in JS, but never getting the time to do it.

Now I'm pretty sad that something like ipfsearch (most of my work was rebuilding Lunr.js) already exists.

I guess I should do more research before embarking on a side project :( Anyway, I definitely learned a lot about how search engines work...

P.S.: now, I'm noticing that Lunr can't fetch the index over the network. OK, so not all time has been lost!


Yeah, and I think it actually uses a pretty old (0.x) version of lunr. You could definitely grab data over the network and then feed it into lunr's indexer, you'd just have to do it manually. I'd prefer to do it more like Cloudflare's stackexchange search (build the index locally, and publish the index to IPFS), rather than having the client have to fetch the data and build the index itself.

Your tool looks cool! If I find the time, maybe I'll try to dig in a bit and see if I can help with anything!


[flagged]


Would they really be interested if they just found out that most of their work was rebuilding an already existing tool..?


Are you interested?


Starting bid, 99 cents! In reality, why are you soliciting the sale of your domain name that doesn't seem entirely relevant to the project this person was doing?


Sold for 99¢. Its about to expire, and I would rather someone else have it, perhaps someone with a little more ambition than I.


Considering I want to use this as it is meant to be used, distributed, how do I pin all of say https://ipfs-sec.stackexchange.cloudflare-ipfs.com/ham/ (including the search) to my local IPFS node so I can use it offline?


Normally you would go about it like this:

    dig +short txt _dnslink.ipfs-sec.stackexchange.cloudflare-ipfs.com
^ will get you the root hash. Then for finding the hash of that particular directory:

    ipfs resolve /ipfs/QmVsEiBRsdiHMvmDVJTNtrXNqSKYdtoepcoZxUrQ8rU7qR/ham
But unfortunately it errors, probably because it's a HAMT sharded directory and the `resolve` call hasn't been changed to accept them yet. There is a open issue about it here: https://github.com/ipfs/go-ipfs/issues/5270

Edit: Actually found a work-around.

Loading up the hash via our IPLD Explorer reveals the (re-)name of the directory, which happens to be `05ham` and we can call `resolve` on that: "ipfs resolve /ipfs/QmVsEiBRsdiHMvmDVJTNtrXNqSKYdtoepcoZxUrQ8rU7qR/05ham"

Then we get the `/ipfs/zdj7WZpLoZYr4AoDUdpCNBVHLi1S8y6muFoCgwUa9kmxUpYCr` hash which you can use for pinning.


Now we just need a browser extension that does all that.

If pinning was a button press away it would be much more attractive than making a dns query, resolve the result and pinning that.


Indeed, that's a good idea! I took the freedom to open a issue about this idea on the repository for the IPFS-Companion extension here: https://github.com/ipfs-shipyard/ipfs-companion/issues/581


So I've been trying to get "ipfs pin add --progress /ipfs/zdj7WZpLoZYr4AoDUdpCNBVHLi1S8y6muFoCgwUa9kmxUpYCr" to complete but it just takes hours and hours. Is this normal? What might be the issue?


Does it get stuck or are you still seeing that blocks are being fetched? If it's still increasing, it's just very large. You can check the download rate by doing `ipfs stats bw --poll`.

Weird thing is, I'm not seeing anyone providing it but I can still pin it. Currently on `Fetched/Processed 1661 nodes` but I'll leave it running and see if it finishes.

But doing `ipfs dht findprovs zdj7WZpLoZYr4AoDUdpCNBVHLi1S8y6muFoCgwUa9kmxUpYCr` should list everyone who is providing it, but I'm getting no responses, which is weird.

Edit: Hm, got stuck at 2007 blocks, are you seeing the same thing?


Fetched/Processed 6077 nodes

and ~/.ipfs is not even 40 Megabytes...


How does IFPS index new content that's added to the stackexchange network daily? And/or how long does it take to add new data?


The site is a snapshot of StackExchange data. If a new snapshot gets published, a new index would be generated and uploaded with it by the uploader.


Ah. So it happens manually? Do automated backups occur? Or is that not the/a possible goal of IFPS? Just trying to understand.


The automation is something you can do relatively easily with the IPFS libraries, API, or CLI.

Ideally your approach to pulling the data from its source into IPFS is somewhat deterministic, so that you get de-duplication and generally more efficient updates.


Stack Exchange for Kiwix, yes! I can't find Stack Overflow though - has anybody tried scraping it with https://github.com/openzim/sotoki ?


Kiwix has an archive of StackOverflow but its gigantic. :( We barely got the Math StackExchange uploaded in time.


Great Scott! 122GB. A weekly update of the local rpi would seem quite irresponsible on a regular basis - how would I use rsync with IPFS? (I could not find Wayne Davison's user id on HN, he seems to maintain rsync) - Are zim files optimized for rsync?


Here is one really important part that I almost missed when reading this announcement:

> Using Cloudflare's gateway, you can also build a website that’s hosted entirely on IPFS, but still available to your users at a custom domain name.

This is huge step forward for the distributed web!


The killer app for IPFS in this early days should be people who are paying $10/mo to host their half-dozen HTML files on WhateverHost companies everywhere. Bonus points if they are hosting heavy files, like podcast files, for example.

Is there a Soundcloud clone on IPFS anywhere? (One that doesn't provide you with cloud storage, only the public index)


I'm working something like this. The idea is that the artist hosts their own server and payment gateway, and therefore receives all the profits from any sales they make.

Adding IPFS ontop of this would make it more robust.


I'm interested (although I'm not an artist nor I have any stuff to host besides my own rants).


Is there a way (even not free) to ensure that my data in IPFS is always available (at least as available as with "traditional hosting services)?

That's a bit of a requirement when building a serious service/website.


As I understand it, the way is: run an IPFS node (perhaps on a cheap VPS); use `ipfs pin ` to force the node to retain the file even if it isn't being requested.

Presumably someone could sell "pinning as a service" for fairly cheap: a popular file will be widely cached, so the service will rarely need to provide it; and an unpopular file will be rarely requested.


Isn't pinning as a service essentially a CDN?

Is there a reason to believe that such service would be cheaper than a "traditional" CDN?



CAP theorem.


What we're talking about only requires the A and the P.


Perhaps you can elaborate?


You can use any gateway to do that :)


Cloudflare offering it might legitimize IPFS in ways that non-corporate entities championing it doesn't.

Not a huge Cloudflare fan, but I will admit this is cool of them to spend engineering time and money on.

EDIT: Thanks for the efforts to promote the Distributed Web. A surprise, to be sure, but a welcome one.


That's the whole point. We think the Distributed Web is cool and coming and we want to accelerate that happening. Hence the time and effort to make this happen.


I was just wondering the other day if IPFS would get enough traction to reach critical mass. This is a very hopeful sign I think.

Cheers!


That was one of the drivers. We liked the core technology, but could see there was a significant barrier to actually using it. Being able to point your domain at IPFS content and then providing that (ssl secured) site to the masses is a big step forward.


That is wonderful to hear


Bravo


A most welcome feature.


We wrote a tutorial on setting it up on your own (secure nginx server for ipfs gateway), https://medium.com/textileio/tutorial-setting-up-an-ipfs-pee.... But for sure, for most users, going right through cloudflare is an awesome simplification. Exciting stuff.


You can use any gateway with your custom domain name? Potentially you could but I don't know of any that let you do this.

My impression of the public gateways was that the end user would need to know the content hash of your website which makes for a poor user experience.

This announcement claims the end user just uses the website as they normally would by visiting example.com but behind the scenes there is no central web server because the content is pulled from IPFS.

DNS was the missing piece :)

https://developers.cloudflare.com/distributed-web/ipfs-gatew...


You can use custom domains with any go-ipfs gateway, it'll transparently turn `Host: example.com` into `/ipns/example.com`.

What Cloudflare adds to the mix is their infrastructure for automatic TLS certs.


I was running my own website with my own domain on ipfs quite successfully for few months. And I have configured cloudflare just to have a green https icon.

The only problem is that I have to visit my website once in a while, otherwise the content is deleted on the gateway.


Can you?

You will still need to run a node to host your own content on IPFS. Unless you just simply plan on using data that already exists in the wild


Any plans to support Dat in the future too? I've had a bunch of distributed projects in mind recently, and I've still not decided which platform to start experimenting with properly, but Dat is definitely one of the other interesting ones alongside IPFS.


No plans currently. I'm not really familiar with Dat. Can it do something that IPFS can't?


Yes, IPFS is purely hash based which is great for static images/movies.

DAT is also hash based, but at least it has support for top-level asymmetric keys that you can put files into, and ADD files to without the root directory changing its hash. IPNS works around this, but isn't ideal.

Neither systems can handle high frequency P2P data that changes - for instance Reddit-like sites etc. those ( http://notabug.io ) and other sites like the Internet Archive (which has IPFS, WebTorrent, and GUN versions decentralized) are built on our system (https://github.com/amark/gun), are already pushing terabytes of P2P traffic.

And don't forget about WebTorrent and Secure Scuttlebutt!!!


Not sure if this fully addresses your use-case, but I like the idea of serving a static bootloader from IPFS. The bootloader would contain all of a website's assets, and code for getting dynamic content from a backend. The backend could be:

- A central API where the bootloader can do arbitrary validation on the API responses.

- WebTorrent, Scuttlebutt, IPFS PubSub, etc.


Yes, that is already what the P2P Reddit does, but without IPFS (although this is a good idea!), and using GUN as the "backend" (fully P2P/decentralized though), SEA for validation/security (no need for a central one), and DAM for pubsub (no flooding problems like in libp2p) which can do WebRTC.

I'm sure people would love to see an IPFS version of a bootloader, instead of HTTP, that is a cool idea. Have a repo for it?


It's both similar and different in many ways :) But I think adding support for it alongside IPFS wouldn't be too much of a pain, as superficially they work in somewhat similar ways. Primarily Dat is targeted towards sharing/storage of research data, which I think would be a cool thing to spread support for.

It has a browser that allows people to easily make and navigate sites, called Beaker. It has some really interesting projects built around it, and as marknadal says, it supports changing the contents of locations etc.

I believe Dat works over Tor currently, which is interesting. However finding info on successes/problems with the various P2P stacks and Tor is a bit hit-and-miss.


Total noob questions, would love some answers if anyone has them.

Here it says:

"IPFS is a peer-to-peer file system composed of thousands of computers around the world, each of which stores files on behalf of the network. These files can be anything: cat pictures, 3D models, or even entire websites"

- If I run an IPFS node, will I be hosting other people's data? What if these files are illegal? How can my "computer" (I guess a node is the terminology but whatever) be sure I won't be hosting some bad stuff?

Then it says:

"The content with a hash of QmXnnyufdzAWL5CqZ2RnSNgPbvCc1ALT73s6epPrRnZ1Xy could be stored on dozens of nodes, so if one node that was caching that content goes down, the network will just look for the content on another node."

- How long does it take for 1 specific file to be "re-distributed" on another node? and how many times is a file rehosted/distributed on a different node? It would be stupid to have the same file hosted 1.000.000 times - what's the break off point? Is there like a healthcheck so to speak which ensures that a certain file is in danger of disappearing from the network and therefore is automatically pushed for re-distribution

And ultimately, how can I know for sure that CloudFlare won't play the game where as acting as a proxy they will modify some of the files served? Imagine I want to retrieve cat.gif and cloudflare intercepts my request and serves me cat1.gif - I guess it all boils down to trusting them? But hold up, isn't p2p file system like this all about trusting the network and not 1 server?


The IPFS network works in a pull-model, not push. So when nodes add content, they only add it locally. It's not until some other nodes request it, as it actually transfers.

So if you start a IPFS node locally, it won't get content pushed to it (well, some DHT data which is basically routing information, but not the content itself).

> ultimately, how can I know for sure that CloudFlare won't play the game where as acting as a proxy they will modify some of the files served?

You'll have to add some checks on your end for this as HTTP/browsers cannot provide these guarantees currently (maybe in the future?). So, download file A and add it to IPFS from your end. If you end up with the same hash, the content was not modified.

> isn't p2p file system like this all about trusting the network and not 1 server

Yes, IPFS <> IPFS is trustless as this verification happens automatically and the clients can verify content locally when it gets fetched. IPFS <> HTTP is harder as there is no functionality for doing this verification.


IPFS is a protocol that dictates how it's clients are supposed to validate the data. If you use any of the client libraries they will do this for you (go-ipfs/js-ipfs). If you want to use cloudflare to distribute your index, you need to trust them to use a valid client implementation. Otherwise you can serve an index file off of your own server/network and distribute the rest of the assets via IPFS.


> You'll have to add some checks on your end for this as HTTP/browsers cannot provide these guarantees currently

Browser extensions could by verifying content as it arrives.

https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...


A browser extension could be used to request the data straight from ipfs instead of Cloudflare.



I believe the IPFS team has already built a browser extension that runs a js-ipfs node. Running an IPFS node is not ideal for power and bandwidth constrained devices like a mobile phone. This is where I see gateways fitting in, long-term. Not having to trust a gateway unconditionally is necessary to make this viable.


> And ultimately, how can I know for sure that CloudFlare won't play the game where as acting as a proxy they will modify some of the files served? Imagine I want to retrieve cat.gif and cloudflare intercepts my request and serves me cat1.gif - I guess it all boils down to trusting them?

The hash serves that purpose, the file is identified by the hash for the purposes of retrieval and coming up with another file with the same hash (especially one that's usefully different, like changing the overall message in somebody's blog post) is computationally infeasible. Now you might have a problem there in that in a web only implementation like this the code that checks the hash is probably also provided by Cloudflare, but the functionality is there, you can download the file, compute the hash and compare it against the one you retrieved it from.

> But hold up, isn't p2p file system like this all about trusting the network and not 1 server?

Ideally yes but 1. it's very difficult to get more than a handful of users to use anything that doesn't run in a web browser, 2. IPFS' implementation for Windows is effectively unusable (it's a command line interface like the Linux one, which is a non-starter on Windows), 3. attempts to built a complete IPFS node in the browser are currently incomplete, I believe because DHT discovery isn't possible in the browser presently. These gateways should probably be viewed as a stop-gap solution until such a time as we can have full IPFS nodes in the browser.


This browser extension can help you guarantee e2e integrity in the meantime: https://blog.cloudflare.com/e2e-integrity/


> And ultimately, how can I know for sure that CloudFlare won't play the game where as acting as a proxy they will modify some of the files served?

https://blog.cloudflare.com/e2e-integrity/


It's like a torrent. You don't host data that you haven't accessed. For content to stay hosted on your computer forever you have to explicitly pin it.

The number of people hosting it is the number of people accessing it.

CloudFlare can't modify the files. On IPFS you request files by a hash. If they give you a file with a different hash you know they gave you the wrong file.


I think the point is that they're hosting an HTTP proxy to IPFS. Since you're accessing IPFS files over their proxy they can modify the response you receive, since at that point it's just like any other website.


> Since you're accessing IPFS files over their proxy they can modify the response you receive, since at that point it's just like any other website.

Correct, but gateways aren't intended to be final solutions. They're compromises targeted at increasing adoption. You wouldn't try and reinvent the internet - you'd make the interoperable where possible, such that adopters can still send normal links, etc.


> - If I run an IPFS node, will I be hosting other people's data? What if these files are illegal? How can my "computer" (I guess a node is the terminology but whatever) be sure I won't be hosting some bad stuff?

Thats some of the reasons why i prefer the 'Dat' and torrent(be it classic or DHT) model of distribution per content, instead of IPFS which is per block.

The IPFS model make a lot of sense for cloud storage.. You might create a whole CDN based on its model where the clients dont even need to care about your backend storage layer.

But in the per-content p2p distribution model, we have the 'page-rank' effect, where popular or important stuff, curated by human reasoning of whats popular and important.

This model has worked pretty well with torrent, and public interest should drive the p2p storage model, where by using blocks, you lose some of the control and the context. And to see how this is a problem, the IPFS folks created the Filecoin to deal with the problem of the incentives, which on the 'per-content' model of Dat and Torrent, its a non-issue because of the natural page ranking indexing of content drive by public interest.

If something doesnt matter much, it will go away with time, just like our human brain works.

(Of course, if IPFS add some form of indexing of content by public interest on top of the block system, and even let you control what you serve. That problem would be solved)


(Disclaimer: I work for Protocol Labs on IPFS and other projects [not Filecoin though])

I'm not sure I understand what's different in the way "seeding" or "hosting" of data in Dat and Torrents compared to IPFS. All those requires you to initiate the transfer, nothing gets transferred by itself (unlike FreeNet for example).

Blocks in IPFS is just a detail that a file bigger than X MB gets split up into many smaller "files" so they can be reused between files and simpler to transfer. It has nothing to do with which peers are hosting the content.

In the end, in IPFS, Dat and Torrents, you download and share content based on a ID, and that ID can mean an entire archive of content, or just one picture.


Its not the technical part. Overall i even think that technically, IPFS has a superior model. I understand it works, and how it works.

What im saying is the control of whats important in a organic form is no there yet. The protocol controls what blocks the peers serve and people dont have much of a saying into it.

But given IPFS its layered in a good design, this could be changed and the block scheduling/routing could be binded more to this organic page-rank by letting people control more which blocks they serve, being more content aware.

This could be even done in a topic level, say you are ok to serve "free software". I guess that if you guys implemented something like that, giving some control to the peers of what they serve, it will be much more easy to sell the concept to everybody.


Edit: Nevermind my parent comment, i guess the 'pinning' concept of IPFS does what im trying to elucidate here.. sorry for the noise.

(But the block scheduling algorithm could/might change in the future maybe? turining this into a real issue?)


I really didn't expect this to happen, congrats to the ipfs team! A demo link over here:

https://cloudflare-ipfs.com/ipfs/QmS4ustL54uo8FzR9455qaxZwuM...


This is awesome. We've been running an IPFS gateway at Origin for about a year and are huge fans of the technology. It's nice to see Cloudflare investing in decentralized technology despite contributing to the over-centralization of the web with their core business. There are quite a few public IPFS gateways available, but not all of them are broadly advertised. A few others that I know of:

ipfs.io

ipfs.infura.io

siderus.io

ipfs.jes.xxx

gateway.originprotocol.com


Just want to add an alternative gateway that I've been working on recently: dapps.earth

It is somewhat different in the sense that it redirects from `gateway/ipfs/hash` to `hash.ipfs.gateway` to ensure different ipfs websites are hosted at different origins (Cloudflare also talked about origin issues in their announcement).

Now with what Cloudflare release, they do seem to solve some of the similar problems, but they still ask users to register the domains by themselves (or access in unsafe way with `/ipfs/hash` urls), which means those domains become a centralization point, and also most of the content is served from the same domain.

It would be fantastic if they allowed access in form `hash.cloudflare-ipfs.com` or `ens-name.cloudflare-ipfs.com`.

In the meantime I may just abandon the idea of serving ipfs content myself, and instead can just operate a small DNS-server that will return CNAME record pointing to Cloudflare and TXT record pointing to ipfs hash for each subdomain.


I just went through our "Public IPFS Gateway Checker" and added a bunch of new ones (include the Cloudflare) one. You can see it running here: https://ipfs.github.io/public-gateway-checker/

And of course, the source code: https://github.com/ipfs/public-gateway-checker/


Oh neat. Should I send a PR to add the others?


Please do :)


This looks great and will hopefully help IPFS grow!

I do wonder if there are any limits which CF are going to impose on these files? A few years ago there was an image hosting site [1] who was told by CF that they were using too much bandwidth on the free plan. With this gateway, can't anyone start doing exactly the same and not even have to be a CF customer?

[1] https://news.ycombinator.com/item?id=12825719


Is Cloudflare doing this in partnership with Protocol Labs, the company backing IPFS? Or are they doing it on their own?

I remember that the way Cloudflare communicated around Nginx rubbed a lot of people the wrong way, because they aggressively positioned themselves as "the de-facto Nginx company", to the point where they would sometimes announce a new Nginx feature on their blog before the Nginx developers themselves had a chance to announce it... And of course nobody was more pissed off than Nginx.com, the actual official sponsor of Nginx.

I'm wondering if they will behave the same way with Protocol Labs and the existing IPFS community? I sure hope not, I like Cloudflare and love IPFS, and would like to see everyone collaboratig in good spirits, and not competing for cheap marketing points.


...and the devil introduces an own church. IPFS is about decentralisation! And Cloudflare is the devil of centralization!


You can easily replace their domain with any other IPFS gateway. On top of that they add further validation to the idea that IPFS is a good idea. Seems like an overall win to me.


If you analyze any decentralized system long enough you'll discover that it can't be used in practice by regular end-users without some element of centralization.

(Dboreham's Law)


Central system Google does not know anything about any Dboreham's "Law".


I agree. I think the fact that we need a gateway is a big problem. And relying on a big company like Cloudflare is a bigger one.

I think that overall this might help for adoption as long as people don't start using Cloudflare URLs. But ultimately we need IPFS somehow integrated into Firefox and Chromium I believe. Or some other seamless integration software.


... and with access to plaintext, SSL stripped data of all traffic


Exactly, it becomes more and more obvious, cloudflare's role in intelligence apparatus becomes bigger.


IPFS offers content integrity, which you can use to bootstrap confidentiality.


Confidentiality without central gateways. The topic was, "with stripped down SSL" and evey user should be aware of, cloudflare can read and store every content of every connection made to a domain connected to cloudflare gateways even if they seem to be end-to-end-encrypted.


Cloudflare is great. I love that they spend resources on projects like this. This would normally be a Moonshot or someone's side projects that would get limited/no release.


This is cool from an implementation standpoint.

However, doesn't this defeat the purpose of using IPFS? Using Cloudflare to front-end content stored within IPFS makes Cloudflare the choke point for all traffic, effectively re-centralizing the distributed content.


It doesn't change IPFS fundamentally, it just provides a simple way to get to IPFS content and a place where it gets cached. We hope it help legitimize IPFS.


Hey, thanks for your work! This is an exciting start to the dWeb!

Any chance you guys would also support DAT, GUN, WebTorrent, and SSB? I know Feross, Dominic, Mathias, etc. would love to introduce you to them. Ping me? mark@gun.eco


You'll need a "centralized" version of a web site for those that don't use IPFS today. That's what a gateway provides. Clients using the IPFS protocol will ignore them and fetch the distributed version.


>Clients using the IPFS protocol will ignore them and fetch the distributed version

Except they won't know the hash for the content they're looking for..


Clients and gateways use the same DNS entries to know that.


Yes and cloudflare has already demonstrated both their ability and willingness to de-platform.

See https://news.ycombinator.com/item?id=15031922


Somehow, this is the first time I've heard of IPFS. Seems really awesome! From a total novice standpoint, can someone help me understand:

>With IPFS, every single block of data stored in the system is addressed by a cryptographic hash of its contents, i.e., a long string of letters and numbers that is unique to that block. When you want a piece of data in IPFS, you request it by its hash. So rather than asking the network “get me the content stored at 93.184.216.34,” you ask “get me the content that has a hash value of QmXnnyufdzAWL5CqZ2RnSNgPbvCc1ALT73s6epPrRnZ1Xy.”

It seems like you must know the "URL" of the "website" you want to visit (files in IPFS) beforehand? But in the case of IPFS, there's no like, DNS service, so you can't type "www.google.com." Basically, it'd be like if to navigate the modern internet, you'd need to know the IP address of whatever site you visit? Is that true of IPFS? Is there any way around that?

It seems like a strong limitation, unless someone can make some sort of IPFS search engine that happens to hash out at QN000000000000000000000 or some really memorizable hash... which seems extremely unlikely!


IPFS has a name resolution protocol, called IPNS. Conceptually it works similar to DNS. More advanced discovery protocols are being worked. The thing with IPFS though is that, because the raw identifier (not address) is the hash of the content, multiple nodes can serve the same content.

A key idea to to focus on de-centralization, so for example when you want a particular piece of content you could send a query out asking your known nodes for content with that hash, they then ask others, etc., propagating the request like gossip. Caching can make this more efficient. IPNS allows you to register your node as a provider of named content that has a given name. The biggest benefits of this are (1) that you can update the content (giving it a new hash) and people can still find it by name, and (2) most mere mortals can't remember hashes but names are much easier to remember.

A good introduction to IPFS can be found on HackerNoon, at https://hackernoon.com/a-beginners-guide-to-ipfs-20673fedd3f.

A good library to start with is Libp2p, https://github.com/libp2p


IPNS is a poor answer/solution to the original comment.

Don't get me wrong, I love IPFS (it is a perfect match with our tech) but IPNS is a nice sounding solution to the problem, but suffers terrible performance problems from the IPFS requirement of hashing.

Unfortunately, I know this for fact because multiple companies have already moved from IPNS to us to handle the indexing/updating of decentralized data in a mutable context BUT they still use IPFS for files/photos/etc.

So IPFS is great, but you are better off using another system for naming, indexing, searching, etc. and then point to the IPFS hashes. I'm biased (D.Tube - couple million+ monthly visitors, P2P reddit http://notabug.io - thousand daily actives, Internet Archive, etc.) are using GUN (https://github.com/amark/gun) for this instead.


> suffers terrible performance problems from the IPFS requirement of hashing

Could you be more specific? We're aware of a good bunch of reasons for IPNS's varying performance, but hashing isn't one of them.


IPFS is a collection of many different protocols put together. So regarding naming, the modularity of IPFS gives way for having multiple different naming-schemas pointing to the same content. And users can verify this, as it's all content-addressed.

This image is usually very helpful to look at to understand the architecture: https://ipfs.io/ipfs/QmXzrZY68r958KZtaWMo7DHRB29dePhxSv7DYFm...

So one way to archive human-readable names is having a DNS record pointing to a hash. So the user would still go to `example.com` but underneath, it would be served `/ipfs/QmaSD`. It could also use IPNS (currently build-in naming system for PeerIDs in IPFS) to point to the same content.

Once there are more (or better) naming systems, it'll be easy to add them to IPFS.

But currently, DNS is probably the most used together with IPNS, as most people are used to DNS and it generally works (but not offline, so far from perfect)

Edit: re your own example, with DNS + IPFS it would go something like this: you ask "What the content of the records of example.com?" DNS resolves answers: "This Gateway + This Hash" and you request the hash from that gateway.

I actually made a video explaining the whole dnslink thing as well if you're interested in some visualization on how it works: https://www.youtube.com/watch?v=YxKZFeDvcBs


Check out the white paper at https://github.com/ipfs/papers/raw/master/ipfs-cap2pfs/ipfs-..., section 3.7. There is a name system called IPNS.


I don't know how IPFS handles this currently, but TOR has the same problem (there addresses are basically a hash of the server certificate). The TOR community handles this by widely advertising the addresses of a few entry points. From there you use classic link traversal and bookmarks. In general, bookmarks become extremely important once you can't memorize addresses.


This FAQ from github is a helpful summary of human-readable IPFS via IPNS:

https://github.com/ipfs/faq/issues/16#issuecomment-232497229


Does anyone know the answer to whether large content that is widely available becomes quicker to download similar to peers in BitTorrent?


Yes it does -- IPFS will fetch from multiple peers at the same time. It's not terribly optimized yet for finding even more peers for content if it's already fetching from some, but it already does generally fetch from multiple at the same time.


I'm a bit worried about the abuse potential here. It seems like a great way to distribute movies, warez, etc since not only is Cloudflare paying for the bandwidth but they're also caching the content so you won't be bottlenecked by IPFS itself.


From the article:

IPFS is a peer-to-peer network, so there is the possibility of users sharing abusive content. This is not something we support or condone. However, just like how Cloudflare works with more traditional customers, Cloudflare’s IPFS gateway is simply a cache in front of IPFS. Cloudflare does not have the ability the modify or remove content from the IPFS network. If any abusive content is found that is served by the Cloudflare IPFS gateway, you can use the standard abuse reporting mechanism described here.


Do you have any idea how that'll scale if ipfs takes off? Everyone will be using it to host copyrighted content and you'll end up with the same situation that places like youtube have. Either you'll get sued for making the stuff available through your gateway (I doubt you have the staff to manually take it all down), or you'll have to implement a system like youtube where you trust major copyright holders to take down ipfs content themselves. Right?

What are you going to do when people (on other sites) put up pages with fake ipfs links that are identified as copyrighted information, but are really legitimate files (let's say, <a href="ipfslink">Crazy Rich Asians (2018) 1080p</a> where ipfslink is the latest minified jquery on your gateway? You're going to get a takedown for a link like that, regardless of the actual content, if copyright holders operate as they do now.

Do you think you can require copyright holders to be more diligent and start ignoring their takedowns if they make too many mistakes? Require them to actually verify that the ipfs content (direct hash, or some perceptual hash of it) is their copyrighted content? That seems highly unlikely to be acceptable to copyright holders. If you could require that copyright holders do that before sending takedowns, the next step would be copyright violators hosting massive random files, linking to them as if they're movies. Copyright holders would waste tons of bandwidth downloading that random data, and burning cpu to hash it only to discover it's not their stuff.

I want ipfs to catch on, but I don't see how gateways like this can work, without getting sued into oblivion, or getting into bed with copyright holders and suffering so many mistaken and maliciously false takedowns that your gateway becomes a neutered content distribution platform... like youtube... with lots of legitimate content getting censored.


"Cloudflare will forward abuse reports that appear to be substantially complete to the responsible website hosting provider and to the website owner. In response to a substantially complete abuse report, Cloudflare will provide the complainant with the contact information for the responsible website hosting provider so they can be contacted directly."

So where does this abuse report end up?


For non-IPFS -- It depends on the notification settings selected by the complainant, but generally speaking we notify the website owner based on the Cloudflare account email address, and notify the hosting provider of the website.

For IPFS -- there isn't an "owner" of the content in IPFS per se, and as such we will review the specifics of the report and then determine the appropriate next steps.


"and as such we will review the specifics of the report and then determine the appropriate next steps. "

- I understand this thing is new and policies are being created as issues are discovered, I think it would be best to be more up front about the coming blocking of things, if you all will post a notice or just make it appear that the whatever-file is not found, and things like this.

That statement is very broad, and I can see a future where lots of niche communities are eventually blocked for varying reasons.

Which countries are going to be able to tell you all what to take down?

There was a time when I had a lot of faith in Cloudflare, and the whole, it's a pipe / telephone line, not a moderated server - but the whole stormfront thing weakened your position with all the powers that be from what I can tell.

So if a bunch of <insert-niche-com-here> start posting links based upon the cloudfalre-ipfs-gateway/controversial-not-mainstream-whatever1 , 2 , 3 etc.. and you all end up nuking all of them, will people know that the files still exist and can be accessed with other means aside from the cloudflare portal?

I would think that cloudflare needs to split up and create US company and an EU company and Cananda one, and a Japan... I mean there are so many conflicting things around the world, when you want to censor the internet you just pick a rule from some place on the planet and file a complaint - I think you could removed just about anything.

I like the things you all are trying to do, but being a central place under whatever authorities with a fickle CEO that can single handedly change everything, along with who knows how many agencies and stick a paper to your head and make you change, I think it's time your company became decentralized itself.

I hope that happens, or some rule is made saying you all are just dumb pips and can't make censor decisions, but I'm not holding my breath for that.


"dumb pips" was supposed to be "dumb pipes" - I'm sure most or many would have gotten that typo right away, but for those who did not, I was referring to pipes not people.


The best you can do as an IPFS gateway operator is block particular hashes.

I have had to do this when a botnet was fetching its code from my public gateway and accidentally DDoS'ing my server :).


Where do abuse reports usually end up?

In my experience they end up with the DMCA officer for the site.


After quickly glancing the IPFS document, I can't find the superiority over... say, BitTorrent with DHT and Magnet link(no tracker server or torrent file required.)

Save for data deduplication and IPNS maybe.

And isn't it rather contradictory? A gateway for P2P-based filesystem.


You can't easily interlink torrents in way that torrent clients could understand -- torrents are mostly just packages of data, not made to be a web.

On your point about the gateway -- I've co-maintained the ipfs.io gateway for the last 3 years and my perspective on its importance is the upgrade path from the existing systems. When developing and deploying new protocols, you need to onboard both users and applications/websites, and the most effective way to do so is seamless interoperability between old-web and new-web. I would imagine it tricky to get any critical mass of adoption if everyone has to choose either old-web or new-web.


But as I understand, BitTorrent can express directory structure. So the IPFS features like mounting by FUSE or browsing through web browser via local server can be implemented as the frontend of BitTorrent too.

Currently, there are tons of independently developed BitTorrent protocol implementations while IPFS has very few implementations from the same body.

You can't beat the existing software by slightly better implementations. Apart from probably IPNS, most IPFS features can be achieved by BitTorrent frontend too.

As for the gateway. If somebody want a gateway for P2P network, that's because the same P2P network is inefficient or difficult to use. If the P2P network is difficult to use, not much people run their own node. Rather, they relies on somebody else's node.

We see the bad consequence of this number of times in the crypto-currency.

In order to trust the P2P crypto-currenncy, It's essential that you run your own node for your wallet from the computer you own and trust. But the reality is, it's pretty inconvenient or difficult to do for most of the users. So what happened was, very few big central node which manage the wallet for you, exchange between different currencies for you. At this point, this is not a P2P crypto-currency. The big central node become the single point of failure. If it goes down or breached, you're doomed.

For the healthy P2P network that can replace the existing central-authority network of today, all users must run their own node and it must be very very easy to do so. So there should no demand for the gateway.


The part I'm still confused is: how does the cache being updated? For example in the traditional web architecture if I go to www.example.com/index.htm, the host server of example.com tells the hash of index.htm and depending on that my web browser decides to use its cache or do a fresh request.

How would this work in IPFS? How about dynamic pages? Does that mean my browser still has to contact www.example.com to get the latest version's hash but then has the option to request the file from IPFS instead of www.example.com? What about if example.com goes down?


In basic ipfs the address is content-dependent hash - same url should always return same content (excepting hash collisions which should be so rare and improbable as to never occur in practice).

So the short answer is that there's nothing to update: and url will return the same content (or no content if no one has it pinned anymore).

AFAIK the ipfs approach to this conundrum is a name/resolver system - which essentially adds indirection.

This is Thomas he'll always be Thomas. But for now, he's also class valedictorian. Some time in the future, Thomas will still be Thomas, but someone else might be class valedictorian.


This seems very similar to Freenet. The only difference I see at a glance is that Cloudflare is running gateways so that everyone does not have to run a node.


On freenet other people are sending you files to store, on ipfs you only store your own files, files pinned (basically seeding) and files from your cache. Freenet is built for anonymity, while ipfs is built for performance.


You can simplify freenet's behavior to storing only cached files, though.


Being from wix and managing petabytes of media files, we always wondered if a distributed web solution for websites can actually work.

Even with the best of clouds (more then one of the big names) we faced downtimes caused by multiple cloud failures.

Just imagine, a store that is Geo distributed, resilient and just works.

This is super exciting for the web and for customers of websites!!!

Big thumb up.


Is IPFS really a store, let alone a store with any particular performance or availability properties? If I want to be sure a file is available, don’t I have to host it or pay someone to host it? And if I want to make sure it is geo distributed, I have to geo distribute it or pay a service to? What is the big change?


Wit IPFS you could rent a few cheap storage servers at different providers around the world and have them all pin your content. That should be resilient against most classes of failures (unlike a cloud provider where bad code pushed to production by the cloud hoster can cause failures in all availability regions).

It's not exactly the first protocol enabling that, but IPFS also has the benefit of distributing bandwidth if your clients all access the files via IPFS (since they will serve them from their cache).


I see, so geo distribution could be as simple as making the file available in multiple locations, with IPFS doing the rest.


IPFS is not a store, it's a protocol. You still have to pay someone to host it.


This is excellent. I've currently got a hacked together static page for my personal site, which is fronted by CloudFlare, but forwards the actual request onto ipfs.io (which, in turn, serve the files -- ultimately -- from where I've pinned them on eternum). This will let me take a step out of that chain.


That's a nice initiative.

BitTorrent files are also content addressable, and the DHT allows for routing to several provider nodes. Only files that are downloaded are shared too.

What's the plus-value of IPFS over BitTorrent in this scenario?


Torrents don't have the concept of links at all, they're not a web.

Also: Being able to inter-link any content-addressed data structures. With IPLD [1], you can link from an IPFS directory to a Bitcoin transaction, to a Git repository, to a Torrent.

[1] https://ipld.io && https://github.com/ipld/ipld#implementations


From what I understand from the ipld.io website, IPLD allows to add links to any content-adressable systems. BitTorrent + IPLD could be an alternative to IPFS, I guess? (BTW, notice that BitTorrent isnt used as an example on the front page).

When I looked at IPFS some months ago, it looked like a lot of on-going work. It wasn't clear to me what was new and what were the reusable components of the project, and what was reused from the past.

In BitTorrent, the DHT for example is a reusable block emerged from previous P2P systems, but it didn't try to reinvent the wheel, and it didn't try to add a FS on top of it, so it solves a specific problem.

"IPFS" does a lot of things at once, it would be clearer if the project was cut into sub-specifications that could be reused with legacy protocols. Tho I believe the project is going into that direction, and for example IPLD emerged from it.

IPLD looks like a real step-forward. It looks like a "federation" of previous content-addressable protocols, to make them work together, rather than trying to compete with them.

BitTorrent + IPLD makes more sense than IPFS to me, more developers have a heard of BitTorrent and they knows what are BitTorrent features, so the introduction of a concept on top of it is easier to grasp.


> Torrents don't have the concept of links at all, they're not a web.

BitTorrent has content-addressable links: https://en.wikipedia.org/wiki/Magnet_URI_scheme

You can perfectly form a web with BitTorrent network by sharing e.g. HTML file which links to further resources.


One advantage of IPFS is its developer ecosystem. They are very focused on providing building blocks to developers, and there is a very active open-source community building on top of it. Contrast this with Bittorrent which is more mature and has more nodes (I think), but is less flexible and not designed for me to build interesting new applications on top of it. It's definitely possible to hack cool things on top of Bittorrent, but you'll spend more time navigating arcane documentations and deserted web discussion forums... The difference in community momentum is palpable.


I disagree with that. Implementing Bittorrent is relatively straightforward, it has specs and a bunch of independent implementations exist. While IPFS is mostly a single code project with obscure docs and push to use that code, rather than making your own implementation.


Hm, last time I counted, there are three implementations of libp2p (networking stack from IPFS) in Golang, JavaScript and Rust. Multiformats are also implemented in 5+ languages. The official core implementations are two right now (JS + GO) but there are probably more around, but I personally haven't seen them. Well, one C or C++ as well but cannot find the link for that one right now.

But I agree that in general there are not as many implementations and/or specifications as with BitTorrent.


But I don't want to rely on that libp2p or any ipfs code for that matter.


Yes, it is well documented and you can easily find examples. But for atypical use cases you won't find a very lively community to discuss with (and with Bittorrent, there is not a lot of variety in use cases). There's just less activity around building new things. At least that's been my experience.


I agree with you regarding technical aspect, however the drawback of BitTorrent is widespread stigma related to piracy. That results in a lot of ISPs blocking the protocol.

IPFS is a new fresh brand not (yet) besmirched with piracy, however technically can serve it the same way.


> Cloudflare’s IPFS gateway is simply a cache in front of IPFS. Cloudflare does not have the ability to modify or remove content from the IPFS network. If any abusive content is found that is served by the Cloudflare IPFS gateway, you can use the standard abuse reporting mechanism described here.

That's an interesting legal hypothesis for when they are prosecuted for serving kiddie porn, isn't it. Seems like they're taking on all of same types of likely abuse as running a Tor exit node.

At a minimum it seems like they will have to blacklist serving known abuse.


I think what they're trying to say here is that they may be able to hide/block it, but they can't remove it outright. The same applies for most of their proxying services, they don't have access to modify the origin.


I expect that they'll blacklist anything iffy. The ipfs.io does, so hey. But you can run your own ipfs demon, and gateway. Even as a tor .onion service.


This kind of reminds me of usenet, in a good way.

It will be interesting to see what kind of uses this has over time, or if it will be something sustainable if it gains a lot of traction.

Good stuff!


Nicely written for more general audiences! Big news :) Congrats Juan, Matt, Michelle, and team if you see this!


Censorious centralization platform embraces censorship-resistant decentralized platform.


That's the way we've got modern web, where decentralized web sites need the Google Search as entrypoint, decentralized blogs morphed into Facebook, and Cloudflare is going to monopolize the transport layer of all that.


semi self hosted pastebin alternative with annotations :) (You may need to disable adblock)

https://genius.it/cloudflare-ipfs.com/ipfs/Qmc8qNo5adHBnLGSJ...


Is there a way to uniformly represent links and relationships between IPFS files in IPFS itself?


Yes, that's what IPLD (https://ipld.io/) is for and more. It's used in IPFS to do links between files in IPFS and also links between files and other formats like Git or Ethereum transactions.

You can see the data structure itself if you use the `ipfs object` commands:

    $ tree test-directory
    test-directory
    ├── directory-a
    │   └── file-b
    └── file-a
    
    1 directory, 2 files
    
    $ ipfs add -r test-directory
    added QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH test-directory/directory-a/file-b
    added QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH test-directory/file-a
    added QmVSi16sFvaKxJXGHWj9yCVcbxJBLRePBARKt1KUyWMGAK test-directory/directory-a
    added Qmbm9JEWeD8EX4pQUYpx4eDsjqyzGjQMdGzdJEdwJvcHb4 test-directory
    
    $ ipfs object get Qmbm9JEWeD8EX4pQUYpx4eDsjqyzGjQMdGzdJEdwJvcHb4
    {"Links":[{"Name":"directory-a","Hash":"QmVSi16sFvaKxJXGHWj9yCVcbxJBLRePBARKt1KUyWMGAK","Size":58},{"Name":"file-a","Hash":"QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH","Size":6}],"Data":"\u0008\u0001"}
    
    $ ipfs object get QmVSi16sFvaKxJXGHWj9yCVcbxJBLRePBARKt1KUyWMGAK
    {"Links":[{"Name":"file-b","Hash":"QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH","Size":6}],"Data":"\u0008\u0001"}


IPFS seems like torrents. The post doesn’t really address how abuse will be handled beyond blocking access to a specific hash from cloudflare’s gateway.

Illegal files can and will be uploaded to IPFS, why would anyone run a node knowing they might be hosting child porn or other illegal content that could land them in jail?


You run a node because you have data to share. You're only sharing stuff you've already downloaded. Don't want to share illegal stuff? Don't download it.


that's not how IPFS works: if you haven't downloaded anything illegal via IPFS you won't have anything illegal in your node


I see. Thanks. I found a faq on this, looks like using additional stuff on top of IPFS would require deny lists, but IPFS itself doesn’t.

“Q: Will i store other people's stuff? A: No, by default IPFS will not download anything your node doesn't explicitly ask for. This is a strict design constraint. In order to build group archiving, and faster distribution, protocols are layered on top that may download content for the network, but these are optional and built on top of basic IPFS. Examples include bitswap agents, ipfs-cluster, and Filecoin.

Q: but bitswap says it may download stuff for others, to do better? A: yes, this is an extension of bitswap, not implemented yet, and will be either opt-in, or easy to opt-out and following the denylists (to avoid downloading bad bits).”


Technically, the original paper does call for ahead-of-time sharing of popular datablocks. SO for example, if piracy caught on quick here, then this functionality could push pirated data through.

Even though this was discussed in the initial paper, it was never implemented.


File sharing community should adopt and thrive on this technology. Viva!


How does IPFS handle Denial of Service?


Can you expand on what kind of DoS you're thinking about? Files added to IPFS are stored locally (not pushed to other peers), up until some other node asks about it and/or tries to actually fetch it.


Yeah I guess are these discoverable or guessable? Could someone request my files at a rate I couldn't supply?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: