Hacker News new | past | comments | ask | show | jobs | submit login
How we put IPFS in Brave (2021) (ipfs.io)
211 points by behnamoh on May 24, 2022 | hide | past | favorite | 100 comments



I've had extremely bad luck running go-ipfs at any scale. GC is braindead (literally deletes the whole cache on a timer), networking is slow and drops tons of packets (apparently too much UDP?), and by default it stores each object 2 or 3 times. I'm sure it'll work fine for people using http://dweb.app, and probably go-ipfs will work okay for super casual browsing, but as soon as someone tries to download any substantial IPFS dataset, expect lots of resource limits.


Yup, I had a (tiny, spinning rust) home server that was slowed to nearly a halt. SSH logins would take minutes even when limiting IPFS to 2GiB of the 16GiB of RAM. Stopped go-ipfs and it was instantly snappy again.

My impression of the IPFS project is that the goals are excellent, the core protocol is quite good however they like rewriting the higher level layers far too frequently (for example they have deprecated UnixFS which seems to be the most used format and they keep switching between JSON, Protocol Buffers and CBOR) and go-ipfs seems to be a pretty garbage codebase.


Any chance that, by limiting RAM usage, you forced your application to heavily swap, clogging the disk and making your machine slow?

I have run a public gateway on 2GB of RAM. Later 4GB because it was subject to very heavy abuse, but it was perfectly possible. Perhaps it is a matter of knowing how to configure things and how to not inflict self-pain with wrong settings.


Yes, there is definitely a chance but it was wayyyy worse when I gave it more or unlimited ram. At least this way the machine was operational most of the time. I don't think it was swapping but since I think the limit I applied also affected the page cache it was likely reading it's data from disk a lot more often than it would of if it could own the whole page cache. But this is basically the same effect as swapping.

Maybe there is a Goldilocks value I could find, but I didn't really need IPFS running that much so I just removed it.


It does not delete the whole cache on a timer. It will delete blocks that are not pinned periodically or when it reaches a high-water mark. It does not store each object 2 or 3 times. First, it doesn't refer to anything as an object but rather blocks and a block is only stored once. It will only be replicated if you're running a cluster in which case replication is the point.


I don't really know anything about the Golang GC, but I would not be surprised if the process of scanning for unpinned blocks results in a lot of memory accesses. If too many useful things get evicted from the cache during that process, then I can see why GP is saying it deletes the whole cache.


Why would you ever start anything with, "I don't really know anything about the Golang GC but..." IPFS GC is separate from the Golang GC. IPFS GC deletes blocks on the local node that are not pinned. I'm not sure what you mean by "too many useful things". If it's not pinned it's evicted.


Hahaha oh wow. How embarrassing. I thought the original comment was talking about Golang's GC, as they did specifically mention go-ipfs.

I suppose that's your answer! Simple misunderstanding.


Golang has only 1 tunable GC parameter by design so it could be not opaque enough for certain loads but I learned that putting a ballast in ram fixes the too frequent GC sweeps

https://eng.uber.com/how-we-saved-70k-cores-across-30-missio...


IPFS node garbage collection is not related to Golang GC.


Yep, I made an IPFS pinning service (the second one to exist, IIRC, and the first usable one), and I wish I hadn't. It's a bit of a trash fire.


If you have any specific complaints or experiences you'd like to share I would be interested in hearing about the but "it's a bit of a trash fire" is unhelpful.


hey there stavros, are you from pinata?



No, I'm not.


just checking


Same experience here. It's a real shame. I have the feeling IPFS is trying to do too much and became a bit of a bloated mess.

I love the idea of decentralized content-addressed storage and wish there were a more lightweight way to get there.


It was slow and buggy when it first released; understandable so I waited a few years, tried again recently now that it's popular and it still is an unpractical proof of concept.


IPFS is especially heavy on bandwith

if you plan to host IPFS at home and meanwhile do things on the internet then IPFS isn’t for you

although it’d be a good excuse to upgrade your home network


That seems like an insane usability trade off that would limit adoption quite heavily


> if you plan to host IPFS at home and meanwhile do things on the internet then IPFS isn’t for you

Is it possible to limit the bandwidth and queue depth for IPFS?

I bet you could also lower its limit dynamically whenever web traffic is seem.


"Implement bandwidth limiting" https://github.com/ipfs/go-ipfs/issues/3065

Going on six years now. You can use external tools (like "trickle") or your OS knobs.


nope, the only real possibility (to lower bandwith usage) is to disable peering

without p2p IPFS is nearly useless


So you're saying it's somehow impossible to deploy QoS management in your network to limit IPFS the same way you would limit anything else?

Presumably one could also run IPFS (or Brave) in its own VM, container, or hardware server and to rate limit traffic in and out of it.


IPFS content loads slow enough without rate limiting

rate-limiting will only make the matters worse and again turn IPFS nearly useless

they haven't managed to solve these issues for 6 years now


Am I understanding this correctly?

1. IPFS's bandwidth is so low that it is unusable.

2. IPFS's bandwidth usage is so high that it makes the network unusable.


Yes, that has roughly been my experience. The application throughput offered by IPFS is quite low while the packet throughput is very high. I was experiencing 5-15% packet loss over my internet connection while running IPFS. I'm not sure if a bandwidth limit would even help or if it is related to number of connections.


There's different profiles you can select from. You might have the server profile enabled. Powers save probably consumes the least abs you can opt out of sharing altogether. But the entire point is p2p.


I used to be very gung ho on IPFS, until I learned that the content ID does not depend solely on the content of the file. When one puts a file into the system one can choose different hashing algorithms, which will cause the content ID to be different obviously. However, even when using the same algorithm the content ID will change depending on how it is chunked. I would expect any sane system to consistently produce the same hash/content ID a file. I can see if the system is moving from using SHA2 to SHA3 that it could be stored twice. Don't know whether they have changed things so that a consistent content ID will be produced or not.


The content ID is not the hash of the content, it is the hash of the root of the Merkle DAG that carries the content.

Doing it like that has many advantages, like being able to verify hashes as small blocks are downloaded and not after downloading a huge file. Being able to de-duplicate data, being able to represent files, folders and any type of linked content-addressed data structure.

As long as your content is under 4MiB you can opt out of all this and have a content ID that is exactly the hash of the content.


As I just replied to "cle", some disadvantages doing the way that it is because one can't predict what content ID would be produced. Perhaps the hash of the entire contents of the file could point the hash that is current the content ID would solve this issue. To me, IPFS does not seem useful unless this issue is solved. Also, multiple hashes (different algorithms) of the file could point to the content ID/merkle DAG; so if both SHA2 and SHA3 were both used and one of them had a security issues, then just use the one that is OK.


How would you produce the same hash for different encodings of data?


not sure that I follow what you are asking. I would expect if sha2-256 is used then the content ID would be the same. However, depending on how the content is chunked, the content ID will change. Two disadvantages that I see:

1. if new packages are produced for a release of open source, could I see if there is a copy available via IPFS? No, because one can't predict how it would be chunked. So, one would have to download and then derive a content ID and one can only tell if it is available if the same chunking algorithm is available.

2. if I want to push a package or other binary, can I figure out if it is already available via IPFS? No, one can't.


Wow good to know!


Does anyone understand merkle clock CRDTs?

How do you handle conflicts where two concurrent events occur at the same time? Who wins? I know timestamps are not reliable but I want last write wins behaviour and seamless merge. The paper leaves data layer conflict resolution to the reader. It does suggest sorting by CID. I added a timestamp field for conflict resolution.

After reading Merkle-DAGs meet CRDTs whitepaper I took a go to implement a MerkleClock. It's incomplete. I need to maintain the partial order of "occurs before".

https://github.com/samsquire/merkle-crdt

I also implemented part of the YATA algorithm yesterday. So I think I could merge the plain text merging functionality of that with the Merkle CRDT.

https://github.com/samsquire/yata


In https://github.com/ipfs/go-ds-crdt, every node in the Merkle DAG has a "Priority" field. When adding a new head, this is set to (maximum of the priorities of the children)+1.

Thus, this priority represents the current depth (or height) of the DAG at each node. It is sort of a timestamp and you could use a timestamp, or whatever helps you sort. In the case of concurrent writes, the write with highest priority wins. If we have concurrent writes of same priority, then things are sorted by CID.

The idea here is that in general, a node that is lagging behind or not syncing would have a dag with less depth, therefore its writes would have less priority when they conflict with writes from others that have built deeper DAGs. But this is after all an implementation choice, and the fact that a DAG is deeper does not mean that the last write on a key happened "later".


Thank you for answering this.

I thought of using user indexes by order of connection or user last active time but if you're not worried of the security implications of wall clock and time skew.

Hyperhyperspace project has a previous field on the CRDT operations and can issue undo events to reverse operations.

I suspect you could have a time validator service that is a node that issues revocations if times are in the future. It wouldn't be on the read or write path and it's more to validate that times aren't in the future.


I thought of using Myers algorithm for seamless merging of concurrent updates as this would allow similar strings to remain.


I was interested so I implemented diff3 at HTTPS://GitHub.com/samsquire/text-diff

It doesn't handle conflicts at this time. If the text can be automerged it shall be.


It's a little sad that Firefox isn't the first mobile browserto receive and experiment with new tech like IPFS. I do wonder if they have solved the privacy issues with IPFS before they put it into Brave.

IPFS is probably the best contender for Web3 right now and I hope it'll see more use before the crypto bros take over the term completely


They could ship with IPFS/DAP/I2P/Tor native in Firefox right now, without any requirement of running external software, but choose not to. Instead, we get limited support for IPFS from a desktop-only addon that simply interfaces with an IPFS service already running on the host machine.

Take it a step further: Firefox could allow websites to open sockets and toss arbitrary packets around, and choose not to. If that capacity were available then Javascript could be harnessed to support all sorts of protocols and services. They could even provide Javascript access to monitoring network access point availability and connectivity management.

Imagine then a single page app you could share as an attachment through $messageService and it has all the stuff built in to create ad-hoc real networks in large gatherings that provide data resiliency against the dropping of nodes. You could have the cellular network shut down, protestors arrested, their phones taken, and the data they gathered still retained so long as any node managed to exit the area or the network itself expanded beyond the area of contention.


You have it backwards, stuff like Websockets are built by design to be incompatible with existing implementations. This is because Javascript code is untrusted/untrustworthy, and we already had a plethora of attacks due to foreign JS doing nasty things with what little they had, here's a couple examples:

- SMTP/IRC spamming using Web requests (Cross-protocol scripting, 2002) - https://www.eyeonsecurity.org/papers/Extended%20HTML%20Form%...

- Webpages that detect your router and leak your SSID (or worse) - Samy Kamkar "How I met your girlfriend" (2010), excerpt: https://www.youtube.com/watch?v=tRJMIMBVqFI

Web extensions should allow you to do normal sockets, many years ago I had a Chrome app (I still miss them) as my IRC client.


> Web extensions should allow you to do normal sockets

Not since 2017 or whenever it was that Firefox dropped XUL extensions and replaced them with WebExtensions. The legacy XUL extensions could do much, much more and there was correspondingly much, much more malware in browser extensions.


It's not like Websockets prevent this completely. eBay port scanning: https://www.ghacks.net/2020/05/25/ebay-is-port-scanning-your...


That's a pretty clever attack. It's clear everything can (will?) be exploited at some point, so it's usually down to features vs. user protection.

Unless everyone is ok going back to running random .exe files from emails, I guess.


So treat sockets as one currently treats web cameras and microphones.


> Firefox could allow websites to open sockets and toss arbitrary packets around, and choose not to.

There are very good security and privacy reasons that all browsers (not just Firefox) work extremely hard to prevent this from being possible.


So treat socket access as one does Microphone and Web Cam access.


The problem with that is that regular people (not super-techies) have a much better chance of understanding the implications of agreeing to microphone and webcam use than something called "socket access" - or any other more friendly term that tries to explain what's going on, because it's such a long way away from the level of abstraction that they are likely to understand.


"Allow this Website unrestricted access to the internet?"

Seems no more or less confusing or understandable than allowing access to microphone or camera.


I mean, I could honestly see a ton of confusion along the lines of “isn’t this website already on the internet? what??”


Also not knowing if disabling it will break the page, something even technically inclined people can't know ahead of time. It's not like push notifications where you would have to try hard to build pages that could break without the feature enabled. I could easily see people abusing this to serve pages over alternate protocols and making people expect to need to click "allow."


It's implied it will break or limit the page; just as denying access to microphone and video has that implication.


No it isn't? Unless you're doing something with the microphone or video there's no implication there.


Sure there is. If it needs hardware access for a function then without it that function breaks.


It's no more confusing than the warning provided for self signed certificates.


That requires people to know what "access to the internet" means.

A sizable portion of internet users think that the internet is what you get when you click the Facebook icon on your phone.

But they do at least understand what a microphone and webcam are for.


So make the default negative and appealing, and the positive option scary and diminished.

As Firefox does with self signed certificates and similar.


And your internal network.


Right - the original reason for same origin policies in browsers was to prevent scripts from stealing data from private corporate intranets.


An acquaintance of mine worked for Mozilla on a project to add tor to Firefox. Code was done, but Google, as it’s primary funder, squashed it.


That makes sense. Google wants users to be easily identified and tracked; elsewise their primary revenue model, surveillance capitalism, would be under threat.


>They could ship with IPFS/DAP/I2P/Tor native in Firefox right now

A bit of a tangent, but I really cannot stress enough that if you're using Tor to be private/anonymous that you should never use anything other than the official Tor browser, you will stand out like a sore thumb.


Firefox has worked with IPFS since early 2019 https://addons.mozilla.org/en-US/firefox/addon/ipfs-companio...

Brave just took the step of forcing you to always have the extension installed instead of making it optional, basically

I've tried IPFS a time or two and always found it to be INCREDIBLY slow (even worse than tor) with ZERO content discoverability.


This still requires external software to operate, and isn't available on mobile. It's effectively dead in the water by not being available to use without additional configuration, by default.

I'd argue this is worse than doing nothing. This gave Firefox the ability to say they care, and yet not deliver something meaningful.


I agree. It just leads to "Oh, IPFS? I tried that years ago, it was terrible. I don't recommend trying it."

What do you think firefox could have done to improve things?

And, as someone not well versed, is there any "killer demo" that uses IPFS currently?


I'm working on a collaborative photogrammetry solution (think async/distributed 3d mapping from overlapping pictures) that shares data via IPFS. Flattering myself heavily, I believe this sort of public-data consuming application fits like nothing else.


You where going to contact me end of this week (morphle at ziggo dot nl) for the plant/species identification software: https://news.ycombinator.com/item?id=31537487.

Your collaborative photogrammetry can be combined with the open and free species identification API and my custom OpenStreetMap data extensions and KartaView/OpenStreetCam/OpenStreetView to get more photogrammetry location integration and more free crowdsourced open data to add to photogrammetry. A demo of Seadragon/Photosynth [1] inspired me to work on this.

[1] https://www.ted.com/talks/blaise_aguera_y_arcas_how_photosyn...


That sounds fascinating! Can you elaborate or point me to more information on it? I would love to hear your perspective on it from a real use case.


With pleasure, drop me a mail and I'll get back to you next week (last three letters of my username here @ rest of my username dot artificial intelligence). I haven't put anything online yet though sorry!


Firefox had its time and it is basically on life support and 80% dependent on Google's money and almost always last to support such features. Brave seems to have made Firefox obsolete.

As for IPFS, the crypto-bros seem to already be winning for taking over that term and melding it as part of a layer in web3, just like they did for 'crypto' which that has become too late and that ship has long sailed.

Perhaps the reason why they are winning is because they keep building stuff like this [0] and existing companies are jumping on board with the term already? [1][2]

[0] https://skiff.com

[1] https://developers.cloudflare.com/web3

[2] https://stripe.com/blog/expanding-global-payouts-with-crypto


IPFS is a file system, Web3 is an idea/marketing term to promote blockchain services, so I'm not sure what you mean by that.


Brave is using IPFS for file storage but once the content address (CID) is known anyone can access the file you're looking for. So it remains to be seen how they will leverage IPFS to create scarcity of digital items for their merchandise store. It is a step backward and not what IPFS goals were. A huge number of books are currently on IPFS through libgen, and scihub is going to IPFS eventually. Web3 is just a step back from the greatness that the internet could be. With "decentralized" oracles (3 mining pools control Eth), and centralized front facing websites simply verifying some hash of something.


Again, people use these meme terms without understanding. If I pin a file on IPFS, share the link and decide to delete it tomorrow because I don't like it anymore, that file is then unavailable and anyone trying to retrieve it gets an error. That's because IPFS isn't file storage. It's not Web3. It's not a blockchain. It's decentralized but not everything that's decentralized is an archive.


Web 3.0 has been used for a long time to mean any P2P/distributed/... approach, not just blockchain, even if the blockchain people try to completely take over the term sometimes.


Web3 has also been used to describe web pages designed for easy parsing.

Reader view, tools for the visually impaired, and browser automation are actually useful and commonly used, so that definition win the title for me.

There are certainly useful distributed web tools (e.g. email, TOR, IRC, Matrix, self-hosting, bittorrent), but they're the opposite of recent trends towards monopoly.

The distributed meaning is absolutely poisoned by blockchain at this point.


Even blockchains are less monopolistic than web 2.0 (Google, Facebook, Amazon, etc). At least blockchains are powered by (largely) independent users, instead of a single corporate entity. But I still prefer further decentralized technologies like email or bittorrent


Web3 has a storage layer, a messaging layer, and an execution layer. Most popular Web3 apps use Ethereum for execution, IPFS for storage, and some custom websocket garbage for messaging, but there are many viable Web3 stacks out there that people are using.


What actually defines web3 software? Is sending emails with .exe attachments considered web3?

Like, if we compare this to RESTful servers, there's no set definition but nearly everyone agrees it's verbs and paths over a hierarchical API sending JSON back and forth over HTTP[S].

It seems like most people can't agree on anything except using etherium as a backbone.

So calling something web3 doesn't seem to do a good job describing things like REST or like something like you wrote above.


See also Web 2.0

It's not a technical spec.


That’s because ipfs is actually pretty bad in actual usage, as the anecdotes in this thread agree


Didn't realize 7+ years old is "new" in tech terms.


The actual title is "How we put IPFS in Brave" and that is a lot more interesting.


I have noticed that "How" at the beginning of submission titles seems to get removed automatically for some inexplicable reason.


corrected. HN’s algorithm had removed “how”.


I don’t understand what this implies/enables O.o could somebody be kind enough to explain? Thanks!


You can use ipfs protocol instead of http.

It will start a local ipfs node in the background. Which is basically a local webserver.


Specifically, it's a local process that participates in the global IPFS p2p network, and also exposes content via a local web server


Sounds like fun in jurisdictions where seeding a torrent can get you a copyright lawsuit like my native Germany


This is a common misperception of IPFS. IPFS does not push any content onto your node or force you to host random content. It will only host content you have added or requested to be added to your node.


But if it's a protocol in your browser similar to HTTP, you could easily request something accidentally.

Someone just has to embed an illegal image or video in their webpage, and suddenly you're downloading and seeding it.


Why should I (or anyone else) care about IPFS? It's not like I have trouble storing and retrieving data currently. Moreover, every anecdote in this topic seems to be from people who found IPFS unusable in reality.


IPFS uses the content's hash as address.

This means, there are no needless duplicates and if an IPFS node in your local network holds the data, you don't have to go to the source.


I don't understand why that would be more useful to me than my current storage and backup system.

I already don't store needless duplicates, and when I decide to store something, I don't have to go back to the source after I have stored it. That's the whole point.

How is IPFS helpful?

Has IPFS made anything happen that couldn't have happened without it?

What are its biggest accomplishments, outside of its own spread?


I think that this means you don’t have to download a copy of the file locally?


What I wonder is, how to handle cross-domain security in browsers when you have IPFS? Is there any standard for this yet?


I've been wondering this as well. In IPFS isn't -everything- cross-domain by definition? Each publication of a document would have a new identity and I'm not sure how you'd shoehorn that into something like CORS. You'd probably need something interactive, akin to OSCP vs CRLs.


Plug and play implementations like this is really how decentralized tech should be done and promoted.


So are there any interesting ipfs websites?


(2021)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: