Hacker News new | past | comments | ask | show | jobs | submit login
Instant.io – Streaming file transfer over WebTorrent (instant.io)
349 points by zerognowl on Sept 18, 2016 | hide | past | favorite | 67 comments



I work on some BitTorrent software and while it's a really cool protocol, it isn't designed to sequentially stream data. Some clients support streaming, but the act of prioritizing sequential chunks of data rather than chunks that are most likely to be unavailable in the future is bad behavior for the collective group of peers.

I haven't personally given much thought to solving the problem of streaming, but I am surprised that the WebTorrent FAQ doesn't mention why they didn't take this opportunity to design a protocol that has more suitable trade-offs than BitTorrent. I'm getting mixed messaging; is their goal to connect the BitTorrent network with WebRTC or enable high quality P2P streaming via WebRTC?


Hi, creator of WebTorrent here.

> [BitTorrent] isn't designed to sequentially stream data

We’re working on improving the algorithm to switch back to a rarest-first strategy when there is not a high-priority need for specific pieces. In other words, when sufficient video is buffered, there’s no need to deviate from the normal piece selection algorithm.

But the fact is that with the speed of today’s internet connections, the user is going to finish fully downloading the torrent in a fraction of the time it takes to view it, so they will still spend more time seeding than downloading.

In practice, the only time that the rarest-first algorithm is important is on poorly-seeded torrents, or in the first few hours of a torrent being published when the ratio of seeders to leechers is really bad. I plan to keep improving the piece selection algorithm so that WebTorrent can be a good citizen.

Also: you should note that not all WebTorrent users stream sequentially. That's just one option for downloading the data.

Also: It's noteworthy that BitTorrent Inc.'s official torrent client (as well as the largest player by marketshare), uTorrent, offers sequential downloading, as well as selective file downloading. And the BitTorrent network remains very healthy.

> why they didn't take this opportunity to design a protocol that has more suitable trade-offs than BitTorrent

BitTorrent is the most successful, most widely-deployed P2P protocol in existence. It works really well. My goal with WebTorrent was to bring BitTorrent to the web in a way that interoperates with the existing torrent network.

Re-inventing the protocol would have made WebTorrent fundamentally incompatible with existing clients and prevented adoption. The way we've done it is better. The wire protocol is exactly the same, but there's now a new way to connect to peers: WebRTC, in addition to the existing TCP and uTP.

Also, re-inventing the protocol is a huge rabbit hole. There was already a lot of risk when I started the project -- will WebRTC get adopted by all the browser vendors? Will data channel stabilize and be performant? Is JavaScript fast enough to re-package MP4 videos on-the-fly for streaming playback with the MediaSource API? My thinking was: Why add inventing a new wire protocol and several algorithms to the table?

Thanks for your thoughtful comment. Hope you'll give WebTorrent and our new desktop app, WebTorrent Desktop a try!


Great work. You brought your product to market! Don't mull over what you didn't do.


>Also: It's noteworthy that BitTorrent Inc.'s official torrent client (as well as the largest player by marketshare), uTorrent, offers sequential downloading

Its worth mentioning that the option is in a hidden menu


> We’re working on improving the algorithm

That sounds like you prioritized implementing streaming first over being a good citizen.

> Also: It's noteworthy that BitTorrent Inc.'s official torrent client (as well as the largest player by marketshare), uTorrent, offers sequential downloading, a

To my knowledge that is only available if the swarm condition allows and is not purely sequential. But that is second-hand knowledge, so I may be wrong. But either way, the default is rarest-first.


> That sounds like you prioritized implementing streaming first over being a good citizen.

I read that as "it sounds like you prioritized getting a working proof-of-concept first over working out the long-term details".


But you don't need streaming for a bittorrent-over-webrtc PoC.

And those "long term details" are implemented by all bittorrent clients, so they're hardly something novel that needs figuring out.


There's a difference between an industry-wide proof of concept and a personal one. If I was making a text editor, I would begin by focusing on making a proof of concept that I could accomplish the features that I wanted in the way that I wanted to do it -- it wouldn't help much to say "emacs has that feature so that's fine".

These long term details are implemented by all established bittorrent clients. I would bet that version 0.1alpha of many of them did not, but were rather in a state of "holy moly this works! I should go show HN".


Webtorrent is 2 years old and seems to have several active contributors, do you really think the "0.1 prototype" argument applies here?

Not to mention we're not talking about some optional, nice-to-have feature here, we're talking about a core aspect of bittorrent which gives it robustness.

Also, you forgot to address my other argument.


Yep! I didn't address your other argument because I accept it and there's nothing about it I disagree with. I don't disagree with any of what you just said, either.

I just wanted to point out that "streaming over web-torrents" is the feature being demo'd here, which means that (a) it's a new feature (I assume?) to this project / these developers, and (b) it's clearly something they feel is a nice-to-have feature, because they not only chose to spend time making it, but also announced to HN when they had a working PoC. If people never posted something to HN until they were "100% complete", I think this place would be a lot less interesting than it is.


WebTorrent supports both sequential and rarest-first downloads. Just pass the right option to the webtorrent library. The client.add() method takes a 'strategy' option that can be set to 'sequential' or 'rarest'.

Feel free to open a GitHub issue if you have suggestions for how we can do better.


So in principle the library supports it, but instant.io always passes sequential, i.e. defaults to disabling rarest-first, correct?


What kind of trade-offs ?

Genuinely interested, I know the basics of the torrent protocol, and I don't understand why the torrent protocol wouldn't work for streaming... I mean, you would just need to request the packets in order instead of randomly.

It would be less efficient, sure, but it would work.


> but it would work.

Downloaders are only incentivized to give back data until they are done. Seeders are not really incentivized at all, so they can go away at any moment.

So if everyone downloads sequentially and seeders go away, then you can end up getting stuck in a situation where everyone has the beginning of a file but nobody has the last parts.

When you're streaming the protocol is not robust.

With random order you have a robust protocol that just degrades in throughput if people are selfish.

And with a webpage-bsed service being selfish is as easy as closing a tab.


You can choose to not seed the second you begin downloading files with most torrent clients (at least that I've used), so the argument that torrenting ensures people share while their download is underway is weak.

It's proof that seeders seed for the sake of seeding for the entire system to work.


Exactly. In practice, modern torrent swarms have such an over-abundance of seeders that there is ample bandwidth for everyone. Consequently, the famed BitTorrent tit-for-tat algorithm as well as rarest-first piece selection strategy become a less important.

Update: It's also noteworthy that BitTorrent Inc.'s official torrent client (as well as the largest player by marketshare), uTorrent, offers sequential downloading, as well as selective file downloading. And the BitTorrent network remains very healthy.


None of that addresses what I have said. I'm argueing about robustness in the absence of seeders.

Seeders are not abundant in all swarms. And nothing in the protocol guarantees their existence, and thus they do not contribute to intrinsic robustness.

Also, torrent clients that run in the background and consume few resources are hardly comparable to things that run in a browser tabs, user behavior will differ.


And for popular torrents, torrents with a lot of seeders who have at least okay bandwidth, it shouldn't matter. If a file can be downloaded in less time than it takes to use / consume / watch / listen to that file then there's space for innovation.


Am I missing something or is this thing streaming?


Weird, a friend of mine used to stream with utorrent years ago without problems. Maybe one pause of a few seconds for a whole TV episode.

The problem is not the protocol but the lack of features like subtitles. And torrent contents not being standard. All this can be easily scripted to cover most cases.


The problems are not for the individual user, but for the rest of the group. By downloading blocks in sequential order, it guarantees that the last blocks will be rarer on average, which is bad for collective performance.


While this is strictly speaking true, considering todays fast bandwidth, I don't think this is that big an issue. I am using webTorrent's desktop torrent client which has a streaming mode and usually this goes like this: download a magnet, start watching (in streaming mode), download of the file is finished within 5 minutes, and I keep seeding the full file for the rest of the duration of the movie. Usually, my ratios are bigger than with a non-streaming client torrent. Kudos to feross for his great work!


It doesn't have to be that bad. Users are likely to have more bandwidth available than is needed to stream the file (if they didn't, then the streaming will never work).

Unlike simple HTTP streaming, the clients can use this spare bandwidth to download some blocks from the end of the file, even when streaming from near the start. So a sensible torrent streamer can still ensure that later blocks are not too rare.


I take your point. It's certainly not the end of the world, but it is sub-optimal.


The increase in seeds from a more ergonomic protocol offsets the downside. Especially if the population has mostly the early blocks, but most of the population wants the early blocks, then there's room for much leech/leech sharing


Guys guys hear me out ... the solution is to use "middle in" (not to be confused with middle out from Pied Piper).

If you download from front and back simultaneously only the middle blocks would be scarce and you still get at least half your original streaming rate. And no weirdo will stop watching midway through a movie.


That doesn't solve the problem, it just moves it around. A reasonable compromise would be to use any "spare" bandwidth to fetch random blocks.


And unfortunately, that's not what happens. Clients instead linearly download an hour's worth of content.

That being said, though, do streaming clients make much of a contribution to the amount of seeded data?


If somebody was a genius, they'd write a bittorrent content delivery software that arbitrages those pauses with preloaded ads.


And then someone would come along a few days later with a client that doesn't display those ads.


You'd still have to wait for the first block of data (piece in BitTorrent lingo) to complete.


I'd rather have a buffering symbol than an ad.


Nice one! Plus you would not have to download ad.


No!


The BitX program seems to handle subtitles outside of the container and protocol (some webservice I guess).


Most of the webtorrent streaming demos I've seen have a webseed fallback, so it's not really that important unless you're doing something completely distributed.


BitTorrent streams just dandy if you request/prioritize packets in order rather than random order. Most clients already have this capability.


It works fine and is even supported in the official BitTorrent clients nowadays but it's correct that BitTorrent wasn't designed to stream and streaming was controversial in early BitTorrent.

Clients are meant to prioritize the rarest pieces in the swarm first with some randomization thrown in. Downloading sequentially is bad for the swarms health but as it turns out doesn't seem to be bad enough that the protocol can't handle it, at least as long as there's a mix of streaming and classic clients or streaming clients use a combination of classic and streaming behavior.


But it's sometimes very bad for the network. There are a few papers out there about how if too many people stream vs random download it can lead to bad performance.


Is this necessarily true if all of the connected peers are also streaming? In my understanding, Bittorrent optimizes for the most in-demand chunks, and in a streaming context those would be the ones at the beginning of the video.


It's actually the other way around. BitTorrent clients should prioritize that most rare pieces, there isn't really a notion of what the most in-demand pieces are.

If you were to combine streaming with a BitTorrent protocol that prioritized the most in-demand pieces you would probably not be able to watch most videos to the end without pauses to buffer or maybe even at all.

By prioritizing the most rare, even if all seeds left the swarm there's a much better chance swarm combined may still have all pieces and the torrent can still be finished.


I see a lot of points about how this isn't exactly the best practice but I still don't follow why it isn't the best practice.

Say you're already running a video sharing site and your servers are serving up all the content to the clients. So, you add your servers as seeders. The client comes in with support for webRTC, requests packets in order, gets your servers as seeders along with a couple other people watching the video and everyone goes along their merry way.

The rare portions don't seem to be an issue because your servers are always seeds, always running, and already have the capacity to support all the demand.

Is this not a win/win to reduce some bandwidth consumption?


Absolutely, all the talk about rejecting streaming really concerns the "true" p2p swarms, where everybody can be a seeder and everybody can be a leecher, and there is only one "true" source, the original seeder. In those cases the peers can go down at any moment in time so it is very important for the swarm vitality that pieces be distributed as efficiently as possible.

Your scenario is more or less the same as what we have today for those swarms that are comprised of many peers on the desktop and a few high-speed always-on seedboxes that already act like some kind of CDN.

The more seeders there are, the better, in any situation. The question is whether the swarm we're talking about is whether you can expect some seeders to be relatively long-lived (in which case streaming is ok) or if we are in a free-for-all (in which case streaming is not). Not all swarms are of the first type, far from it.


Bittorrent is designed to not depend on such central, always-on servers. Avoiding piece availability bottlenecks is one of its robustness features.

If you substitute built-in robustness with servers then yes, of course it will still work. But you're weakening the decentralized nature of the protocol by doing so.


Tried to stream a 20GB mkv and it ate all my memory until the oom killer took over =(


Yup, It uses the browser's memory to transfer files.

Until browser apps can be given permission to access the file system, this will be yhe case.


We have this today, it's just that this site doesn't support it. Only place I've seen this used (other than thumbnailing images that are drag and dropped on imgur and whatnot) is on Mega.

Spec: https://www.w3.org/TR/FileAPI/

MDN: https://developer.mozilla.org/en-US/docs/Web/API/File_and_Di...


The MDN article mentions something which I have a question about;

>The API doesn't give you access to the local file system, nor is the sandbox really a section of the file system. Instead, it is a virtualized file system that looks like a full-fledged file system to the web app. It does not necessarily have a relationship to the local file system outside the browser.

>What this means is that a web app and a desktop app cannot share the same file at the same time. The API does not let your web app reach outside the browser to files that desktop apps can also work on. You can, however, export a file from a web app to a desktop app. For example, you can use the File API, create a blob, redirect an iframe to the blob, and invoke the download manager.

Is it possible to "see" from JavaScript whether or not the download manager has downloaded the file completely, so that the file can be removed from FileAPI storage?


I think the spec you're looking for is the FileSystem API (https://developer.mozilla.org/en-US/docs/Web/API/File_and_Di...), not the File API.

Unfortunately, this is a non-standard API implemented only by Chrome. You have to use other APIs like IndexedDB and WebSQL (also deprecated and non-standard) to get a working solution in all browsers.

This deficiency is really holding back the web.


Yup, this works - https://github.com/erbbysam/webRTCCopy/blob/master/client/js... (not the cleanest code)

You can then use idb.filesystem.js to add api support for firefox etc. Search the file above for "is_chrome" for a few idb.filesystem.js-specific quirks.

Looking at that page, it looks like firefox will ship with support in version 50?


I'm using idb.filesystem.js in https://www.sharedrop.io, so that only very small part of the transferred file is stored in memory, but then without asking users for permission (i.e. using non-persistent storage) you "only" get ~4GB (not sure exactly, I tested it with files up to 1.5GB).


Have you looked into using IndexedDB? It's pretty widely supported [1] and far more performant than Files API last time I used it intensively (few years ago).

[1]: http://caniuse.com/#feat=indexeddb



I recently found this service: https://reep.io which, I believe, uses WebRTC to directly transfer between 2 browsers (they claim that after the initial 'handshake' they are out of the equation). I'm curious how it compares with Instant.io for simple file sharing use cases (example: send my mom a movie of my kids that is too large of a filesize to email, in a manner sufficient for a non-technical person to be able to easily receive, view, and save)


Does anyone know if this works privately. Or if there is a good way to seed files only between friends?


Introduce something only you and your friends know. Zipping with 0 compression and a password or a random file will make the swarm completely independent from any other one.


Would really like to know this too


Never worked for me. Tried to downliad Debian, no luck.


why not using simply webrtc?


Cool! But be aware WebRTC leaks public IP address for VPN users, and also leaks hashes of device IDs.[0] And in Chrome, it's very hard to block. This is a dangerous mix with talk of torrents :(

[0] https://www.browserleaks.com/webrtc

Edit: From feross I get that WebRTC no longer leaks ISP-assigned IPs when using VPNs.


> WebRTC leaks public IP address for VPN users

This is incorrect.

WebRTC data channels do not allow a website to discover your public IP address when there is a VPN in use. The WebRTC discovery process will just find your VPN's IP address and the local network IP address.

Local IP addresses (e.g. 10.x.x.x or 192.168.x.x) can potentially be used to "fingerprint" your browser and identify across different sites that you visit, like a third-party tracking cookie. However, this is a separate issue than exposing your real public IP address, and it's worth noting that the browser already provides hundreds of vectors for fingerprinting you (e.g. your installed fonts, screen resolution, browser window size, OS version, language, etc.).

If you have a VPN enabled, then WebRTC data channels will not connect to peers using your true public IP address, nor will it be reveled to the JavaScript running on the webpage.

At one point in time, WebRTC did have an issue where it would allow a website to discover your true public IP address, but this was fixed a long time ago. This unfortunate misinformation keeps bouncing around the internet.

There's now a spec that defines exactly which IP addresses are exposed with WebRTC. If you're interested in further reading, you can read the IP handling spec for yourself.

https://tools.ietf.org/html/draft-ietf-rtcweb-ip-handling-01


Thank you. That's good to know. So is that now the case in all browsers?


It's the case in Chrome, Firefox, and Brave. I assume Opera is the same since it uses Chromium under-the-hood. I don't know about Microsoft Edge.


uBlock blocks webRTC leaking. it's also way better and faster than the crummy adblock extension.


Well, I use Firefox, and just disable WebRTC :)

So does uBlock allow web torrents, while blocking webRTC leaks? I doubt it, because peers need to know public IP address. Unless you run a VPN client in the router, anyway.


I don't think it disables WebRTC, per se. I think uBlock prevents WebRTC applications from leaking your IP. Hangouts still works with WebRTC leaking disabled through uBlock, so that leads me to believe that's what's going on.


If feross is right about WebRTC no longer leaking IP, then maybe it's not necessary to prevent leaks with uBlock?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: