How we put IPFS in Brave (2021)

aftbit · on May 24, 2022

I've had extremely bad luck running go-ipfs at any scale. GC is braindead (literally deletes the whole cache on a timer), networking is slow and drops tons of packets (apparently too much UDP?), and by default it stores each object 2 or 3 times. I'm sure it'll work fine for people using http://dweb.app, and probably go-ipfs will work okay for super casual browsing, but as soon as someone tries to download any substantial IPFS dataset, expect lots of resource limits.

kevincox · on May 24, 2022

Yup, I had a (tiny, spinning rust) home server that was slowed to nearly a halt. SSH logins would take minutes even when limiting IPFS to 2GiB of the 16GiB of RAM. Stopped go-ipfs and it was instantly snappy again.

My impression of the IPFS project is that the goals are excellent, the core protocol is quite good however they like rewriting the higher level layers far too frequently (for example they have deprecated UnixFS which seems to be the most used format and they keep switching between JSON, Protocol Buffers and CBOR) and go-ipfs seems to be a pretty garbage codebase.

hecturchi · on May 25, 2022

Any chance that, by limiting RAM usage, you forced your application to heavily swap, clogging the disk and making your machine slow?

I have run a public gateway on 2GB of RAM. Later 4GB because it was subject to very heavy abuse, but it was perfectly possible. Perhaps it is a matter of knowing how to configure things and how to not inflict self-pain with wrong settings.

kevincox · on May 26, 2022

Yes, there is definitely a chance but it was wayyyy worse when I gave it more or unlimited ram. At least this way the machine was operational most of the time. I don't think it was swapping but since I think the limit I applied also affected the page cache it was likely reading it's data from disk a lot more often than it would of if it could own the whole page cache. But this is basically the same effect as swapping.

Maybe there is a Goldilocks value I could find, but I didn't really need IPFS running that much so I just removed it.

zcw100 · on May 24, 2022

It does not delete the whole cache on a timer. It will delete blocks that are not pinned periodically or when it reaches a high-water mark. It does not store each object 2 or 3 times. First, it doesn't refer to anything as an object but rather blocks and a block is only stored once. It will only be replicated if you're running a cluster in which case replication is the point.

resonious · on May 25, 2022

I don't really know anything about the Golang GC, but I would not be surprised if the process of scanning for unpinned blocks results in a lot of memory accesses. If too many useful things get evicted from the cache during that process, then I can see why GP is saying it deletes the whole cache.

zcw100 · on May 25, 2022

Why would you ever start anything with, "I don't really know anything about the Golang GC but..." IPFS GC is separate from the Golang GC. IPFS GC deletes blocks on the local node that are not pinned. I'm not sure what you mean by "too many useful things". If it's not pinned it's evicted.

resonious · on May 27, 2022

Hahaha oh wow. How embarrassing. I thought the original comment was talking about Golang's GC, as they did specifically mention go-ipfs.

I suppose that's your answer! Simple misunderstanding.

RcrdBrt · on May 25, 2022

Golang has only 1 tunable GC parameter by design so it could be not opaque enough for certain loads but I learned that putting a ballast in ram fixes the too frequent GC sweeps

https://eng.uber.com/how-we-saved-70k-cores-across-30-missio...

hecturchi · on May 25, 2022

IPFS node garbage collection is not related to Golang GC.

stavros · on May 24, 2022

Yep, I made an IPFS pinning service (the second one to exist, IIRC, and the first usable one), and I wish I hadn't. It's a bit of a trash fire.

zcw100 · on May 24, 2022

If you have any specific complaints or experiences you'd like to share I would be interested in hearing about the but "it's a bit of a trash fire" is unhelpful.

0des · on May 25, 2022

hey there stavros, are you from pinata?

shp0ngle · on May 25, 2022

He is from https://www.eternum.io/

stavros · on May 25, 2022

No, I'm not.

0des · on May 25, 2022

just checking

thomashop · on May 24, 2022

Same experience here. It's a real shame. I have the feeling IPFS is trying to do too much and became a bit of a bloated mess.

I love the idea of decentralized content-addressed storage and wish there were a more lightweight way to get there.

sph · on May 24, 2022

It was slow and buggy when it first released; understandable so I waited a few years, tried again recently now that it's popular and it still is an unpractical proof of concept.

ushakov · on May 24, 2022

IPFS is especially heavy on bandwith

if you plan to host IPFS at home and meanwhile do things on the internet then IPFS isn’t for you

although it’d be a good excuse to upgrade your home network

salmonlogs · on May 24, 2022

That seems like an insane usability trade off that would limit adoption quite heavily

musicale · on May 24, 2022

> if you plan to host IPFS at home and meanwhile do things on the internet then IPFS isn’t for you

Is it possible to limit the bandwidth and queue depth for IPFS?

I bet you could also lower its limit dynamically whenever web traffic is seem.

carapace · on May 25, 2022

"Implement bandwidth limiting" https://github.com/ipfs/go-ipfs/issues/3065

Going on six years now. You can use external tools (like "trickle") or your OS knobs.

ushakov · on May 25, 2022

nope, the only real possibility (to lower bandwith usage) is to disable peering

without p2p IPFS is nearly useless

musicale · on May 25, 2022

So you're saying it's somehow impossible to deploy QoS management in your network to limit IPFS the same way you would limit anything else?

Presumably one could also run IPFS (or Brave) in its own VM, container, or hardware server and to rate limit traffic in and out of it.

ushakov · on May 25, 2022

IPFS content loads slow enough without rate limiting

rate-limiting will only make the matters worse and again turn IPFS nearly useless

they haven't managed to solve these issues for 6 years now

musicale · on May 25, 2022

Am I understanding this correctly?

1. IPFS's bandwidth is so low that it is unusable.

2. IPFS's bandwidth usage is so high that it makes the network unusable.

aftbit · on May 29, 2022

Yes, that has roughly been my experience. The application throughput offered by IPFS is quite low while the packet throughput is very high. I was experiencing 5-15% packet loss over my internet connection while running IPFS. I'm not sure if a bandwidth limit would even help or if it is related to number of connections.

tamrix · on May 24, 2022

There's different profiles you can select from. You might have the server profile enabled. Powers save probably consumes the least abs you can opt out of sharing altogether. But the entire point is p2p.

anonymousnotme · on May 25, 2022

I used to be very gung ho on IPFS, until I learned that the content ID does not depend solely on the content of the file. When one puts a file into the system one can choose different hashing algorithms, which will cause the content ID to be different obviously. However, even when using the same algorithm the content ID will change depending on how it is chunked. I would expect any sane system to consistently produce the same hash/content ID a file. I can see if the system is moving from using SHA2 to SHA3 that it could be stored twice. Don't know whether they have changed things so that a consistent content ID will be produced or not.

hecturchi · on May 25, 2022

The content ID is not the hash of the content, it is the hash of the root of the Merkle DAG that carries the content.

Doing it like that has many advantages, like being able to verify hashes as small blocks are downloaded and not after downloading a huge file. Being able to de-duplicate data, being able to represent files, folders and any type of linked content-addressed data structure.

As long as your content is under 4MiB you can opt out of all this and have a content ID that is exactly the hash of the content.

anonymousnotme · on May 26, 2022

As I just replied to "cle", some disadvantages doing the way that it is because one can't predict what content ID would be produced. Perhaps the hash of the entire contents of the file could point the hash that is current the content ID would solve this issue. To me, IPFS does not seem useful unless this issue is solved. Also, multiple hashes (different algorithms) of the file could point to the content ID/merkle DAG; so if both SHA2 and SHA3 were both used and one of them had a security issues, then just use the one that is OK.

cle · on May 25, 2022

How would you produce the same hash for different encodings of data?

anonymousnotme · on May 26, 2022

not sure that I follow what you are asking. I would expect if sha2-256 is used then the content ID would be the same. However, depending on how the content is chunked, the content ID will change. Two disadvantages that I see:

1. if new packages are produced for a release of open source, could I see if there is a copy available via IPFS? No, because one can't predict how it would be chunked. So, one would have to download and then derive a content ID and one can only tell if it is available if the same chunking algorithm is available.

2. if I want to push a package or other binary, can I figure out if it is already available via IPFS? No, one can't.

siwatanejo · on May 25, 2022

Wow good to know!

samsquire · on May 25, 2022

Does anyone understand merkle clock CRDTs?

How do you handle conflicts where two concurrent events occur at the same time? Who wins? I know timestamps are not reliable but I want last write wins behaviour and seamless merge. The paper leaves data layer conflict resolution to the reader. It does suggest sorting by CID. I added a timestamp field for conflict resolution.

After reading Merkle-DAGs meet CRDTs whitepaper I took a go to implement a MerkleClock. It's incomplete. I need to maintain the partial order of "occurs before".

https://github.com/samsquire/merkle-crdt

I also implemented part of the YATA algorithm yesterday. So I think I could merge the plain text merging functionality of that with the Merkle CRDT.

https://github.com/samsquire/yata

hecturchi · on May 25, 2022

In https://github.com/ipfs/go-ds-crdt, every node in the Merkle DAG has a "Priority" field. When adding a new head, this is set to (maximum of the priorities of the children)+1.

Thus, this priority represents the current depth (or height) of the DAG at each node. It is sort of a timestamp and you could use a timestamp, or whatever helps you sort. In the case of concurrent writes, the write with highest priority wins. If we have concurrent writes of same priority, then things are sorted by CID.

The idea here is that in general, a node that is lagging behind or not syncing would have a dag with less depth, therefore its writes would have less priority when they conflict with writes from others that have built deeper DAGs. But this is after all an implementation choice, and the fact that a DAG is deeper does not mean that the last write on a key happened "later".

samsquire · on May 25, 2022

Thank you for answering this.

I thought of using user indexes by order of connection or user last active time but if you're not worried of the security implications of wall clock and time skew.

Hyperhyperspace project has a previous field on the CRDT operations and can issue undo events to reverse operations.

I suspect you could have a time validator service that is a node that issues revocations if times are in the future. It wouldn't be on the read or write path and it's more to validate that times aren't in the future.

samsquire · on May 26, 2022

I thought of using Myers algorithm for seamless merging of concurrent updates as this would allow similar strings to remain.

samsquire · on May 26, 2022

I was interested so I implemented diff3 at HTTPS://GitHub.com/samsquire/text-diff

It doesn't handle conflicts at this time. If the text can be automerged it shall be.

jeroenhd · on May 24, 2022

It's a little sad that Firefox isn't the first mobile browserto receive and experiment with new tech like IPFS. I do wonder if they have solved the privacy issues with IPFS before they put it into Brave.

IPFS is probably the best contender for Web3 right now and I hope it'll see more use before the crypto bros take over the term completely

dleslie · on May 24, 2022

They could ship with IPFS/DAP/I2P/Tor native in Firefox right now, without any requirement of running external software, but choose not to. Instead, we get limited support for IPFS from a desktop-only addon that simply interfaces with an IPFS service already running on the host machine.

Take it a step further: Firefox could allow websites to open sockets and toss arbitrary packets around, and choose not to. If that capacity were available then Javascript could be harnessed to support all sorts of protocols and services. They could even provide Javascript access to monitoring network access point availability and connectivity management.

Imagine then a single page app you could share as an attachment through $messageService and it has all the stuff built in to create ad-hoc real networks in large gatherings that provide data resiliency against the dropping of nodes. You could have the cellular network shut down, protestors arrested, their phones taken, and the data they gathered still retained so long as any node managed to exit the area or the network itself expanded beyond the area of contention.

Hamcha · on May 24, 2022

You have it backwards, stuff like Websockets are built by design to be incompatible with existing implementations. This is because Javascript code is untrusted/untrustworthy, and we already had a plethora of attacks due to foreign JS doing nasty things with what little they had, here's a couple examples:

- SMTP/IRC spamming using Web requests (Cross-protocol scripting, 2002) - https://www.eyeonsecurity.org/papers/Extended%20HTML%20Form%...

- Webpages that detect your router and leak your SSID (or worse) - Samy Kamkar "How I met your girlfriend" (2010), excerpt: https://www.youtube.com/watch?v=tRJMIMBVqFI

Web extensions should allow you to do normal sockets, many years ago I had a Chrome app (I still miss them) as my IRC client.

TedDoesntTalk · on May 24, 2022

> Web extensions should allow you to do normal sockets

Not since 2017 or whenever it was that Firefox dropped XUL extensions and replaced them with WebExtensions. The legacy XUL extensions could do much, much more and there was correspondingly much, much more malware in browser extensions.

solarkraft · on May 24, 2022

It's not like Websockets prevent this completely. eBay port scanning: https://www.ghacks.net/2020/05/25/ebay-is-port-scanning-your...

spicybright · on May 24, 2022

That's a pretty clever attack. It's clear everything can (will?) be exploited at some point, so it's usually down to features vs. user protection.

Unless everyone is ok going back to running random .exe files from emails, I guess.

dleslie · on May 24, 2022

So treat sockets as one currently treats web cameras and microphones.

simonw · on May 24, 2022

> Firefox could allow websites to open sockets and toss arbitrary packets around, and choose not to.

There are very good security and privacy reasons that all browsers (not just Firefox) work extremely hard to prevent this from being possible.

dleslie · on May 24, 2022

So treat socket access as one does Microphone and Web Cam access.

simonw · on May 24, 2022

The problem with that is that regular people (not super-techies) have a much better chance of understanding the implications of agreeing to microphone and webcam use than something called "socket access" - or any other more friendly term that tries to explain what's going on, because it's such a long way away from the level of abstraction that they are likely to understand.

dleslie · on May 24, 2022

"Allow this Website unrestricted access to the internet?"

Seems no more or less confusing or understandable than allowing access to microphone or camera.

virgildotcodes · on May 25, 2022

I mean, I could honestly see a ton of confusion along the lines of “isn’t this website already on the internet? what??”

usrn · on May 25, 2022

Also not knowing if disabling it will break the page, something even technically inclined people can't know ahead of time. It's not like push notifications where you would have to try hard to build pages that could break without the feature enabled. I could easily see people abusing this to serve pages over alternate protocols and making people expect to need to click "allow."

dleslie · on May 25, 2022

It's implied it will break or limit the page; just as denying access to microphone and video has that implication.

usrn · on May 25, 2022

No it isn't? Unless you're doing something with the microphone or video there's no implication there.

dleslie · on May 25, 2022

Sure there is. If it needs hardware access for a function then without it that function breaks.

dleslie · on May 25, 2022

It's no more confusing than the warning provided for self signed certificates.

simonw · on May 25, 2022

That requires people to know what "access to the internet" means.

A sizable portion of internet users think that the internet is what you get when you click the Facebook icon on your phone.

But they do at least understand what a microphone and webcam are for.

dleslie · on May 25, 2022

So make the default negative and appealing, and the positive option scary and diminished.

As Firefox does with self signed certificates and similar.

Nursie · on May 25, 2022

And your internal network.

simonw · on May 25, 2022

Right - the original reason for same origin policies in browsers was to prevent scripts from stealing data from private corporate intranets.

stevetodd · on May 24, 2022

An acquaintance of mine worked for Mozilla on a project to add tor to Firefox. Code was done, but Google, as it’s primary funder, squashed it.

dleslie · on May 24, 2022

That makes sense. Google wants users to be easily identified and tracked; elsewise their primary revenue model, surveillance capitalism, would be under threat.

worble · on May 24, 2022

>They could ship with IPFS/DAP/I2P/Tor native in Firefox right now

A bit of a tangent, but I really cannot stress enough that if you're using Tor to be private/anonymous that you should never use anything other than the official Tor browser, you will stand out like a sore thumb.

_khhm · on May 24, 2022

Firefox has worked with IPFS since early 2019 https://addons.mozilla.org/en-US/firefox/addon/ipfs-companio...

Brave just took the step of forcing you to always have the extension installed instead of making it optional, basically

I've tried IPFS a time or two and always found it to be INCREDIBLY slow (even worse than tor) with ZERO content discoverability.

dleslie · on May 24, 2022

This still requires external software to operate, and isn't available on mobile. It's effectively dead in the water by not being available to use without additional configuration, by default.

I'd argue this is worse than doing nothing. This gave Firefox the ability to say they care, and yet not deliver something meaningful.

spicybright · on May 24, 2022

I agree. It just leads to "Oh, IPFS? I tried that years ago, it was terrible. I don't recommend trying it."

What do you think firefox could have done to improve things?

And, as someone not well versed, is there any "killer demo" that uses IPFS currently?

TheaomBen · on May 24, 2022

I'm working on a collaborative photogrammetry solution (think async/distributed 3d mapping from overlapping pictures) that shares data via IPFS. Flattering myself heavily, I believe this sort of public-data consuming application fits like nothing else.

morphle · on May 31, 2022

You where going to contact me end of this week (morphle at ziggo dot nl) for the plant/species identification software: https://news.ycombinator.com/item?id=31537487.

Your collaborative photogrammetry can be combined with the open and free species identification API and my custom OpenStreetMap data extensions and KartaView/OpenStreetCam/OpenStreetView to get more photogrammetry location integration and more free crowdsourced open data to add to photogrammetry. A demo of Seadragon/Photosynth [1] inspired me to work on this.

[1] https://www.ted.com/talks/blaise_aguera_y_arcas_how_photosyn...

spicybright · on May 24, 2022

That sounds fascinating! Can you elaborate or point me to more information on it? I would love to hear your perspective on it from a real use case.

TheaomBen · on May 25, 2022

With pleasure, drop me a mail and I'll get back to you next week (last three letters of my username here @ rest of my username dot artificial intelligence). I haven't put anything online yet though sorry!

rvz · on May 25, 2022

Firefox had its time and it is basically on life support and 80% dependent on Google's money and almost always last to support such features. Brave seems to have made Firefox obsolete.

As for IPFS, the crypto-bros seem to already be winning for taking over that term and melding it as part of a layer in web3, just like they did for 'crypto' which that has become too late and that ship has long sailed.

Perhaps the reason why they are winning is because they keep building stuff like this [0] and existing companies are jumping on board with the term already? [1][2]

[0] https://skiff.com

[1] https://developers.cloudflare.com/web3

[2] https://stripe.com/blog/expanding-global-payouts-with-crypto

trompetenaccoun · on May 24, 2022

IPFS is a file system, Web3 is an idea/marketing term to promote blockchain services, so I'm not sure what you mean by that.

joshcryer · on May 24, 2022

Brave is using IPFS for file storage but once the content address (CID) is known anyone can access the file you're looking for. So it remains to be seen how they will leverage IPFS to create scarcity of digital items for their merchandise store. It is a step backward and not what IPFS goals were. A huge number of books are currently on IPFS through libgen, and scihub is going to IPFS eventually. Web3 is just a step back from the greatness that the internet could be. With "decentralized" oracles (3 mining pools control Eth), and centralized front facing websites simply verifying some hash of something.

trompetenaccoun · on May 25, 2022

Again, people use these meme terms without understanding. If I pin a file on IPFS, share the link and decide to delete it tomorrow because I don't like it anymore, that file is then unavailable and anyone trying to retrieve it gets an error. That's because IPFS isn't file storage. It's not Web3. It's not a blockchain. It's decentralized but not everything that's decentralized is an archive.

detaro · on May 24, 2022

Web 3.0 has been used for a long time to mean any P2P/distributed/... approach, not just blockchain, even if the blockchain people try to completely take over the term sometimes.

andy81 · on May 24, 2022

Web3 has also been used to describe web pages designed for easy parsing.

Reader view, tools for the visually impaired, and browser automation are actually useful and commonly used, so that definition win the title for me.

There are certainly useful distributed web tools (e.g. email, TOR, IRC, Matrix, self-hosting, bittorrent), but they're the opposite of recent trends towards monopoly.

The distributed meaning is absolutely poisoned by blockchain at this point.

woojoo666 · on May 25, 2022

Even blockchains are less monopolistic than web 2.0 (Google, Facebook, Amazon, etc). At least blockchains are powered by (largely) independent users, instead of a single corporate entity. But I still prefer further decentralized technologies like email or bittorrent

px43 · on May 24, 2022

Web3 has a storage layer, a messaging layer, and an execution layer. Most popular Web3 apps use Ethereum for execution, IPFS for storage, and some custom websocket garbage for messaging, but there are many viable Web3 stacks out there that people are using.

spicybright · on May 24, 2022

What actually defines web3 software? Is sending emails with .exe attachments considered web3?

Like, if we compare this to RESTful servers, there's no set definition but nearly everyone agrees it's verbs and paths over a hierarchical API sending JSON back and forth over HTTP[S].

It seems like most people can't agree on anything except using etherium as a backbone.

So calling something web3 doesn't seem to do a good job describing things like REST or like something like you wrote above.

nl · on May 24, 2022