Hacker News new | past | comments | ask | show | jobs | submit login
Arc – A P2P CDN that runs in the browser (arc.io)
102 points by sansui12 on Sept 5, 2021 | hide | past | favorite | 68 comments



> only runs when a device is connected to Wi-Fi or Ethernet. Cellular bandwidth is never used.

This cannot be guaranteed.

They seem to use the NetworkInformation API, which as of now is declared as experimental and straight up does not work on Firefox or more importantly on Safari (and therefore the whole iOS browser ecosystem). Apart from that, they're using IP ranges of mobile carriers for detection. [1]

So let's say you're using iOS and a VPN while on the go. This will use your mobile data unless you're using an adblocker (since their service is only opt-in if you're using an adblocker) [2]

[1] https://github.com/easylist/easylist/issues/7872 [2] https://github.com/uBlockOrigin/uAssets/pull/8874#issue-6142...


Hey guys! I'm Ansgar (https://github.com/gruns). I build Arc (http://arc.io/cdn).

In the past, I built two of the world's largest YouTube to MP3 converters. In doing so, I learned the hard way 1) that distributing content at scale globally is painful and _expensive_ and 2) that I hate ads. So I built Arc.

It's a two-sided content exchange. On one side, websites buy a faster, 10x cheaper peer-to-peer CDN. On the other side, websites make money without ads by contributing bandwidth to the peer-to-peer CDN. Arc connects the two, like Airbnb connects guests and hosts.

As bandwidth capacity grows around the globe, we find ourselves in a world where people can share bandwidth both beneficially and imperceptibly. We see glimpses of that already today: Amazon shares bandwidth with Amazon Sidewalk, Microsoft shares bandwidth with Windows updates, etc. We build Arc for this world -- a post-adblock world -- to give sites a better, more ethical way to support themselves that doesn't bombard users with ads, suck up their personal data, and preserves their privacy.

A few notes:

- For sites that use Arc's CDN, users do not use upload bandwidth and Arc's widget isn't displayed. It's just a faster, 10x cheaper CDN in one <script> tag. That's it. (See https://arc.io/faq#do-users-upload-content-with-just-the-cdn)

- For sites that monetize with Arc, we mandate that Arc's widget remains visible and intractable in the lower left corner of your website so users can learn about Arc and, if they so desire, opt out. (See https://arc.io/faq#can-i-move-modify-or-hide-arcs-widget) Additionally, Arc never activates on cellular connections; Wi-Fi and ethernet only.

- If you elect to opt out (two clicks in Arc's widget), you're opted out of all sites with Arc.

Please email me if you'd like an invite code to take Arc's CDN for a spin: ansgar@arc.io. It's 10x cheaper than Fastly, AWS, Google, etc. (See http://arc.io/cdn/). And I'd love to hear your thoughts! Feedback is how good products become great.


I’m curious what your analysis of the equilibrium states of this model look like. I think it’s an interesting approach, but I’m not entirely convinced it’s sustainable.

Edit: TL;DR; show me the math! :D


How does Arc make P2P connections from a Service Worker? IIRC WebRTC isn't available.


> only runs when a device is connected to Wi-Fi or Ethernet. Cellular bandwidth is never used.

That's interesting. How does the website possibly detect that I'm on a cellular network when I'm a) use Firefox (which doesn't implement the NetworkInformation API) and b) use a VPN to either a data center or my home server? What about iOS devices using Apple's pseudo-Tor (because Safari doesn't expose NetworkInformation either), or devices using tethered WiFi?

I don't believe this claim at all. This makes me take all their other claims they make with a massive grain of salt as well.

There's also the massive privacy issue: other people will know what websites you visit by simply using the P2P system, and the entire thing seems to be opt-in unless you use an adblocker. That last part shows that the devs know of the privacy issue but have decided to take the practical approach of not fixing the issue and only doing the bare minimum to remove their website from blocklists.

It's only a matter of time before someone will make a tool that enumerates all IPs in the Arc network together with the content they've been served. This one is going onto my Pihole's blocklist...


Great questions!

> How does the website possibly detect that I'm on a cellular network when I'm a) use Firefox (which doesn't implement the NetworkInformation API) and b) use a VPN to either a data center or my home server?

Yep. An IP lookup is also done to see which AS the user is on. Eg an AWS IP vs a T-Mobile IP.

We also do this to detect when people tether, as when tethering Chrome will report a Wi-Fi connection via the NetworkInforamtion API but it's Wi-Fi on top of an underlying cellular connection.

> There's also the massive privacy issue: other people will know what websites you visit by simply using the P2P system

All cached data is both fragmented and encrypted. When a node sends data to another peer, it's an encrypted fragment of a file and the sender doesn't know 1) what data it's sending nor 2) which website that data is being sent for.


It's an interesting idea. I did thinking it looked pretty scammy (felt like a crypto miner), but...

It does feels different in that it doesn't use any user's bandwidth if the site doesn't monetize it and it mandates a UI with an opt-off button if the site does monetize it and uses the user's bandwidth.

But I'm not sure if this is ever gonna work... will a nearby user device really give the content that I need than a datacenter...? There's no mention on the FAQ page how it prevents from a fake user sending malicious scripts across the network as well.


> There's no mention on the FAQ page how it prevents from a fake user sending malicious scripts across the network as well.

All content is fragmented, encrypted, and hashed before it's distributed across the network. If a peer ever receives a file piece from a peer and that piece's hash doesn't match the expected hash, it's dropped along with the connection to that peer.


> All content is fragmented, encrypted, and hashed before it's distributed across the network. If a peer ever receives a file piece from a peer and that piece's hash doesn't match the expected hash, it's dropped along with the connection to that peer.

So, depending on the hashing algorithm, an adversary could use a hash collision to circumvent this, right?


I wonder how long it will take before arc.io appears on the same content blacklists as ads and cryptominers?



So uBlock added them to the abuse list for not being opt-in, so the developer responds with adding opt-in only if you're using an adblock. Way to completely miss the intent of the rule.


>Arc uses only a small portion of spare bandwidth, imperceptible CPU, 300MB of browser cache, and only runs when a device is connected to Wi-Fi or Ethernet. Cellular bandwidth is never used.

Ugh, the arrogance. To think that you are somehow entitled to use my bandwidth and CPU power is such a sadly typical mindset today.

Yet another reason to run NoScript.


Well, you're using CPU and bandwidth as long as you're using assets delivered with that CDN. Seems perfectly fair to me.


And that's for each page that uses it I assume, so If you have 10 pages using Arc, that's not going to be imperceptible.


Not quite. =] Arc coordinates and synchronizes across tabs (via an iframe). So Arc behaves identically whether you have one open tab with Arc or 100.


Aren't tabs supposed to be sandboxed?


Origin should be sandboxed rather than tabs, isn't it?


Sounds like the iframe is used to glue sandboxes together


Nicely done.


I wonder what is the time to first byte for a given asset when the page loads for the first time. Establishing a webrtc connection takes several rtts to the tracker/signalling server and will always be longer than a plain CDN request. I think it's rather useful on subsequent page navigation then, when p2p connections are already established.

What about partnering with browsers like Brave, that would embed it in many pages at once?

Disclaimer: I work at Lumen/Streamroot, we do this for video.


Before this is loaded and completely ready, you probably load resources with your regular CDN


So that you can seed it later, very probably


So, if I run this on a server and fill my cache, can I then monitor other people browsing arc.io-enabled websites?

I feel the privacy implications of such a technology haven’t been evaluated.


Great question! All cached data is both fragmented and encrypted. When a node sends data to another peer, it doesn't know 1) what data it's sending nor 2) which website that data is being sent for.


PeerCDN tried this and was acqui-hired by Yahoo in 2013 and Peer5 did this for video but was acqui-hired by Microsoft. What about P2P CDN didn't work before?


PeerCDN was too early; the web wasn't ready yet.

For example, to intercept network requests to serve them from the peer-to-peer network, a <![CDATA[ tag had to be inserted into the <head> of the document to block rendering of the subsequent document. Then, once the document had finished downloading, the page HTML was manually rendered so all assets (eg JS, CSS, images, etc tags) could be loaded via JS instead of the browser natively. This was both slow and resulted in empty white pages on load. Now? We have the Service Worker API. (https://developer.mozilla.org/en-US/docs/Web/API/Service_Wor...)

I'm not as familiar with Peer5's tech stack, so I can't speak there. But hi Shachar!


Lots of people would go to great lengths to secure a job at Microsoft, so I'm not sure if it counts like a failure, just not a runaway success.


Internal sources say Peer5 were not acqui-hired ;) But we took a very different direction than this or PeerCDN


> Arc uses only a small portion of spare bandwidth, imperceptible CPU, 300MB of browser cache

Sounds very scammy. Is this happening without user consent? What about privacy?


This is my first time hearing about it too, but I don't see what you see. Where is the scam? What about privacy?

It seems like a good way of monetizing a site by creating something with intrinsic value to others (CDN capacity) out of something with next to no value to users (spare bandwidth). Sure beats the alternatives:

- ads: value=user's money that they manipulate them into spending. Additionally, they usually track people relentlessly and in turn use that data to manipulate them even better. No value is created, just money is transferred with extra steps.

- crypto miners: the value of (PoW) crypto is net negative (see environmental concerns), despite being profitable for some. They also decrease the user experience by draining batteries and slowing down computers. So no real value is created and negative value for users.

- micro-donations: they just transfer something of value (money) from the user to the operator. Nothing is "created", so users are by definition losing money. You can argue that things should cost money, but that's a separate discussion - value for users is still negative.

Of course I'd prefer this to be a vendor-neutral standard and not some private company, but none of the current "distributed web" solutions have gotten any serious level of adoption. This one actually got some attention since it's also a monetization platform - even if webmasters don't care about the distributed web idea, they still help get is closer.


Using resources that belong to the user (CPU time, RAM, disk space, battery) for you own needs without user's consent is very questionable.

On top of that, this p2p tools advertises the user's IP address to other peers without their consent. Furthermore, it allow other users to see what contents are cached and at what time in the day.

"Sure beats the alternatives" is in no way less scammy.


Be careful that providing upload bandwidth also means consuming battery. Radio on phone can't be sleep while uploading.


It's scary how things like this can become invisible to the user. WebRTC is a great concept that could really improve decentralization, but so incredibly easy to misuse that it's absurdly irresponsible browser developers enable it by default. Remember how people went to jail for simply having certain p2p file sharing software running? All this one takes is opening a web page.


What's so irreponsible about it? Any website can waste your download/upload/CPU/RAM/disk. WebRTC does not make anything worse, but rather allows for a plenty of useful applications, some of them might help to reduce centralization which is the plague of the web IMO.


>waste your download/upload/CPU/RAM/disk

Those are present here but not the largest threat. This service apparently makes you download ~300MB you have no idea what is and SHARES it in the background. What if it is child abuse content? This isn't even a "think of the children" argument, it can have very real consequences to anyone cluelessly browsing the web unlike your usual p2p network.

WebRTC has additional insecurities and deanonymization potential like VPN leaks. I'm fairly certain the only reason so many people have it enabled is a certain large browser distributor happens to own some services that use WebRTC to reduce server bandwidth costs, and they want to make it's usage seamless without displaying even a small warning about the risks. Perhaps they also consider deanonymization a feature.


> Those are present here but not the largest threat. This service apparently makes you download ~300MB you have no idea what is and SHARES it in the background. What if it is child abuse content? This isn't even a "think of the children" argument, it can have very real consequences to anyone cluelessly browsing the web unlike your usual p2p network.

HN can upload 300 MB of child abuse content to your computer. WebRTC is not required for that. Even JavaScript is not required for that, create tiny 1px images and link those to (un)desired content, they'll be cached in your browser according to cache headers, possibly infinitely. That's the nature of the web: you have very little control over your browser, unless you'll scrutinize every HTTP response which nobody does.

> WebRTC has additional insecurities and deanonymization potential like VPN leaks.

Legitimate issues should be fixed or put behind prompts, I agree with that. But issues inherent for Web should not be blamed for WebRTC IMO.


Yep, this is the same reason I uninstalled manyverse based on https://scuttlebutt.nz/ from my phone again as well, despite it not having any of my friends on it, joining any public room could download any sort of content onto my phone which I would then be responsible for. Cool idea in theory, questionable abusability in practice.


> Remember how people went to jail for simply having certain p2p file sharing software running?

No. Do you have a link?



The story is about a raid, not jail, but point taken in any case.

EDIT: Later he mentions being “released”, so I guess there was some jail involved.


> it's absurdly irresponsible browser developers enable it by default

And not absurd for people who disable it because it leaks your real IP when using a VPN


My first reaction was "yikes, that's super unethical to use users' bandwidth, storage, and processing power without their knowledge". But after thinking about it a lot, I have to say I've come around.

As users, we all pay for upload bandwidth, and 99% of the time it's sitting there idly and providing literally no value to us or anyone else. That's because gosh darn it, we paid for it, so there's no way we're gonna let anyone else us it!

But, if we stop thinking in binary terms of "my property" or "your property", and open our minds for a minute, what's so bad, really, about each of us donating a few resources to make the shared Internet run faster for everyone? If everyone does the same, we're all better off, and no one is worse off since the bandwidth was just going to waste anyway. Furthermore, the storage needed is already in use, storing the exact same content, just for your own exclusive use instead of everyone's, and the additional CPU cycles needed are surely much fewer than it takes to run the worthless ads your browser downloads everyday.

I think of it as sort of the same logic as vaccination, where we ask people to pay a small private cost in a way that produces outsized benefits for society as a whole. In particular, each person gets an individual benefit much larger than the cost of the private resources they paid.

Now, if the private resources aren't practically free like bandwidth or ROM, but actually expensive like capped data or mobile phone battery life, the ethical logic starts to get a bit murkier. However, I think it's definitely possible to build an effective peer-to-peer CDN without needing to touch those kind of scarce resources.

The other reservation I have about this sort of project is that you really need to be able to trust the people running it to not abuse their control of the network (which really just means: other peoples' computers). That's, in my view, a much thornier issue than anything relating to the consumer end of the network.

Let's hope the team will be able to successfully tackle both the technical and the human problems of a P2P CDN, and help push the Internet towards a more communal, sharing paradigm. Best of luck to them.


I wish I knew how this sort of CDN was created, so I could sponsor one explicitly for NSFW sites.

It’s a large an existent market, but always one that’s ignored because of the complexities of dealing with it. However if you are already an insider, initial moderation isn’t so hard.


I have serious doubts that this is going to be remotely as performant as any major CDN.


Under EU Directive 2009/136/EC a user's permission is required before Arc can store any data in their browser. Is Arc obtaining that permission, or just looking for forgiveness by allowing users to Opt-Out?


a.k.a. The EU Cookie Law (doesn't just apply to Cookies) End users must be asked to Opt-In, if they reject then it's ok to deny them access to the page, just like if they reject Ads.


Gonna have to specify a section/article of the directive you are referring to to get an answer from op/the creators.

They aren't gonna read a thousand page directive to answer a question about legalities on a hacker news forum.


Yeah I have to say I am for this in the vein of reducing centralization. Yes in a way it is akin to a crypto-miner, but the future internet could literally be distributed in this way, so props to them.


This doesn't actually decentralize anything, it relies on a central system to connect peers.


Ah, well, never mind then



Any one know what the typical latency is like?

I see mentions of many points of presence and claims of pops being close (I assume due to the size and distributed nature of the peers) and therefore could be faster but no measurements or numbers in the FAQ.


How stable is the CDN when the each node lifetime is 5 minutes? The likelihood of an ongoing request being cut off is fairly high. Nodes are also constantly in syncing state I assume.


Very! The lifetime of nodes is actually quite a bit longer than 5 minutes as the tab doesn't need to be foregrounded (which Google Analytics and others report as the end of the 'session'). That, and when a node comes online, it joins the network with the cache contents it had filled previously. So it joins the network warm, not cold and needing to be filled.


Interesting.

Is the cache shared between all the websites that install this product?

What happens to a HTTP request to a browser whenever they close their tab, is it retried on another browser?


Since there is so much interest into turning a browser into a BT client basically for CDN purposes, check out this: https://github.com/webtorrent/webtorrent

Unwittingly, people who put the WebRTC into the browser turned Chrome into world's biggest file sharing network, and now the Genie is out of the bottle.


> Add Arc to your existing CDN(s) and your performance will only improve and you'll only save money.

Seriously?


So arrogant to monetize your website using your users in such a way. It’s one thing to show me ads, it’s another to use me as a node in a distributed CDN for which I gain no value, and everyone else does. It’s not even some kind of p2p situation I as a user opt into because I believe in the communal network.

I hate this leechy behavior that has come to plague the internet for a while now. When it was all nerds it was glorious. Then the money came, and the bullies followed.


You gain the value of whatever service you’re using. It’s the same approach of monetizing a site with ads, except instead of “paying” for the site with your data or eyeballs, you’re paying with your bandwidth.

I actually quite like this idea. Any site/app/etc that is free has to make money, at the very least to pay for infrastructure, and I find this a lot less invasive than surveillance, and not “arrogant” at all.


Do you really this it’s an OR versus and AND? I cannot imagine sites doing this won’t also have ads.

This is also not just bandwidth, but also 300mb of storage.


I wonder can't a simple `display: none` hide Arc's widget?


As a user, yep. But doing so as a website gets your Arc account banned and your websites blacklisted.


anyone got an invite code? looks interesting


Of course! Shoot me an email at ansgar@arc. =]


How does the performance compare?


wonderful idea. What could possibly go wrong. /s


Gnutella was painfully slow, and error prone even during its best times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: