Hacker News new | past | comments | ask | show | jobs | submit login
Using BitTorrent with Amazon S3 (amazon.com)
189 points by folz on May 18, 2015 | hide | past | favorite | 75 comments



I'm fairly certain this has been a feature of S3 since its launch in 2006. Here's as far back as I could find on archive.org[0] it's May 24th 2006 and the URL indicates the documentation was published on March 2006.

[0] http://web.archive.org/web/20060421112025/http://docs.amazon...


It was there since the inception 03/14/2006 to be exact.

https://noisemore.wordpress.com/2006/03/14/amazon-s3-has-bit...


Yes, and you'll find this date in the top left corner of the linked AWS API documentation.


Anyone downloading software or any content over a slow and flaky network would question why people don't use torrents for distribution with some "permanent" seeder.

I really had a horrible experience sometime back, while downloading a software of around 2 GB. Where a network would die down and chrome would discard the partial download.


Some distributors like Humble Bundle do on fact do this; the "permanent seeders" are called "web seeds", which are implemented by just having a standard HTTP server implementing the RANGE request. This let's a client fetch arbitrary chunks of a file, so a BitTorrent client can fetch ranges corresponding to torrent chunks.


Could you elaborate on this? I'm not familiar with Bittorrent's protocol. Can a regular HTTP server (say, nginx) be used as a web seed without any software in front of it translating the Bitorrent protocol requests into a regular HTTP request?


The trick is that the client has been modified to read the URLs in the .torrent file and to construct the appropriate range requests for getting pieces from the web seed.


In torrent clients that support it, you can simply add a webseed to an existing torrent. Works great for linux dists.


The torrent is making regular HTTP requests using the range header to the server. Its a feature most BitTorrent clients support.


Sure, you don't have to own the server either. http://burnbit.com/ is good for turning things into resumeable downloads


The problem with torrents are at the ends - initial seeding and lack of seeders as traffic declines. Both cases benefit from a static web seed.

I ran a site a couple of years ago that did this very thing, but it was quickly overwhelmed with illegal content..


Chrome's downloads can actually be resumed using the wget --continue flag I believe.


That depends on a few things, such as the file still being at the previous url, you still having access to the file, etc.


That's helpful, thank you!


That's not a protocol limitation, that's Chrome; plenty of other programs can resume HTTP downloads.


Some servers don't support it though. I know that's when Firefox throws away partial downloads.


Chrome? Why Chrome? :-) Browsers are not the best tool for the job, although support DL resume fairly well (e.g. Safari).

There are 'downloaders' to deal with this kind of issue. That said, torrents are most optimal for a non-stable network connection.


Downloader tools can be quite unpleasent. It is often a choice between overcomplicated adware, or you have to use the vendors software that may not be very good. Downloading large files should be as simple and reliable as small files for the user. Perhaps Metalink[1] support would help?

[1] http://www.metalinker.org/


aria2 is a pretty killer tool, no ad or anything. Just a CLI.

http://aria2.sourceforge.net/


Aria2 is incredible. +1 for it! Should be standard install on everyone's workstation if you need a download manager.


There's plenty of uncomplicated, adfree software. For Windows, I'm partial to Miniget (which supports both HTTP and Bittorrent): http://www.miniget001.com/


Yes. On OS X, I find myself using Leech¹ quite a bit.

――――――

¹ — http://manytricks.com/leech/


They do. It's used in games very often for example. However, you have to cover this in your EULA (or whatever) and have to have fallback (either optional or automatic) because many ISPs would block such traffic.


In the US they aren't allowed to.


AT&T does. My internet would become unusably unstable every time I torrented, so I called them about it. They said they blocked torrents to "comply with the law" or somesuch BS, nevermind that I was attempting to download a Linux distro.


I didn't say they didn't, just that they weren't allowed to. See e.g. http://money.cnn.com/2015/02/05/technology/fcc-net-neutralit...


You may have also been hitting this: http://en.wikipedia.org/wiki/Bufferbloat

I wouldn't put a lot of faith into what front-line tech support says. They'll often outright lie to you to get you off the phone.


Can you elaborate on this? My current ISP definitely shapes torrent traffic to the point where it is impractical to download anything larger than a few KB.


> I really had a horrible experience sometime back, while downloading a software of around 2 GB. Where a network would die down and chrome would discard the partial download.

Me too. And it was _Apple_ providing the download, for crissakes.


Not sure how someone can justify down voting me. It's a true story and relevant.

Tim Cook, is that you? (I don't think Tim Cook would do that).


Just moved a bunch of our company's large downloads over to S3 recently (mostly BSP releases and drivers), and this would be awesome for that 100MB - 4GB package size range that they have!

Tried it with a few clients: Transmission does not seem to work, though (both web client, and GTK): "Tracker gave HTTP response code 404..."; rtorrent seems to be able to download (but looks like no upload?); deluge downloads well and also kickstarts the transmission clients.

I wonder what's unacceptable in the Amazon infrastructure for transmission... And the 80kb/s max seed rate mentioned in the comments might be a showstopper already.


A couple of months ago I tested with Transmission (on a Mac) and it worked fine. The problem from my point of view was that the S3 side only seeded at 50-100 KB/sec, but if I downloaded directly from S3 I could pull 2MB/sec.

Unless you have a bunch of seeds, that are essentially permanent, it's pretty useless.


Might still be useful as a seeder of last resort to borrow done finance terminology. Ie back a large number of say community nix isos with it.


I'm currently using this to build a decentralized distribution system to ship relatively big database files to iOS clients without having to go through S3 all the time. It works quite well but some providers throttle the bittorrent traffic (or worse, throw in RST packets) so I'm not sure how well this will work in practice. Also note that the S3 default seeders only upload at around 80kb/s, so you'll always need at least one external seed to get good performance.


May I ask how you're doing torrent on iOS? The reason for my interest is that I recently ported ipfs (http://ipfs.io) to iOS and am interested in any experience you've had with running these styles of protocols on iOS clients in the field ..


It's just libtorrent packaged up as iOS8 framework + a convenience API, available here: https://bitbucket.org/jberkel/torrada. Still WIP and not used in a production app, but hopefully soon.

It works well if you're on a non-throttled network and has support for local peer discovery which is great for testing (install from a local torrent seed, or another device). Apart from bandwidth saving it's also very convenient to let libtorrent handle consistency checks and resuming of transfers.

ipfs looks interesting, have you published your port?


Thanks for the details about torrada, I'll check it out!

Seems that for peer apps on mobile, we need to make it easy for the user to understand their bandwidth-usage - indeed in my testing I never run the app where I don't have solid, unlimited Wifi. For these classes of apps, I think the presentation of this problem to the user is going to be important.

>ipfs port

Its more of a harness than a port - basically, its possible to use Go on iOS, and so I have a project set up to include Go, cross the bridge between Obj-C and Go runtimes, and so on .. and, then I've built a Go app with ipfs, running on iOS. There aren't any features yet - but the harness is done, so, TODO: write Go, use ipfs framework, interface with iOS-GUI normatives, etc.

When I get actual ipfs functionality coded, I'll put it up on a repo .. too much futzing around is required right now to see anything actually functioning.


Ah, I had a look at goios, I can imagine that it is a lot of work, the integration looks quite complicated and far from being finished, good luck.

About sharing bandwidth, was thinking about only enabling when the user is on wifi, but I think I'll use torrent more as a distributed mirror network than a "real" P2P system, which is difficult in a mobile context. People can easily "donate" bandwidth by seeding the data files needed for the app (using a separate torrent client though, not the app itself).


Actually, I got everything working, and can build and include an ipfs project in the iOS bundle .. I just don't have a UI to wire it up to the Go-side functionality. But in case you're interested you can have a look here:

https://github.com/seclorum/ios-go-ipfs.git

Might have to change a few paths in the build-script, but it'll give you an idea (if you're interested) how to go about building Go apps on iOS. Turns out its not that hard! :)

>Sharing bandwidth

Yes, I see the same sort of conclusion from my perspective of wanting to put ipfs on iOS - its really only as a way of committing content to the mesh when the User wants to serve - i.e. idle times connected/charging on local Wifi .. I think my next step will be to wire up the camera and a few fields to be used to create a publishable set of content ..

Anyway, pull requests welcome!


RST should be easy to avoid by prefering UDP where possible, but in general i understand why ISPs would try to hamstring bittorrent with dramatic upstream overprovisioning that they've got.


You can get torrent only for objects that are less than 5 GB in size. That's a bit limited.


It's limited because the original S3 only supported objects up to 5 GB. (Indeed, at times objects of over 2 GB didn't work...)

Bittorrent support is a legacy feature; I don't think I've seen Amazon advertise it for several years.


It is a limit, yes. That is a very large baby you are throwing out with the bath water.


That's 40 gigabit limited, I suppose.


As someone who prizes his high quality movie torrents, I can testify that files over 5GB in size don't stay adequately seeded for very long after their initial release.

It seems like an arbitrary limit, but an adequate one for the time being.


I wouldn't think it's hugely limiting. In practice I'd speculate that the majority of torrents out there are either around 180MB, 370MB, or 1800MB

If (for whatever reason) you choose to use S3 as your tracker, then you just need to commit to breaking your >5GB content lumps up into multiple pieces.


You're probably right that the majority of torrents on the internet line up with the common TV show sizes, but it doesn't at all follow that users of this service will follow that distribution.

I have 60gb files in S3, and saturating available bandwidth to download them in a reasonable time frame if they're not in the same country is actually a bit of a challenge.

Bit torrent would be one of the fastest and fault tolerant ways to retrieve the files, so it not being available stings a bit.

While I'm trying to download large files on consumer internet, I'd imagine that huge files between geographic locations and server grade connections would face a similar problem.


A bit OT but...

Javascript is disabled or is unavailable in your browser. To use the AWS Documentation, Javascript must be enabled.

...really? Their documentation is perfectly readable without it. All the links are real, bookmarkable links; even the buttons for the PDF, forums, and Kindle version work.


I suspect there's a default 'javascript required' message in a file that's included with all pages.


Are there easy to use multiplartform BT libs around that would let you use this just for high bandwidth, nearby peers? Eg filter peers by latency as a first pass.


Why would you care to exclude peers slow or far? What benefit would you expect?

The torrent library to use is libtorrent.

https://github.com/rakshasa/libtorrent


S3 will normally provide ample bandwidth, but in some situations there's a local shared downlink that is the bottleneck. For example in asset heavy multiuser VR or game apps used by many people in same location. By only including the nearby peers I get all the benefit but minimize the risk of people accusing me of clogging / mooching off their uplink.


(You probably want Arvid's libtorrent, at http://www.libtorrent.org, not the one linked above.)


There's webtorrent by feross! https://github.com/feross/webtorrent


I've used this in the past to distribute scrapes of black-markets (eg https://www.reddit.com/r/DarkNetMarkets/comments/2zps7q/evol... & https://www.reddit.com/r/DarkNetMarkets/comments/2zllmv/evol... ). It saves a lot of bandwidth (which is not all that cheap on S3) and so far the <5gb restriction hasn't been an issue.


Wow! This may lower the entry barrier for a new movie/TV streaming company. They can lower their costs by offsetting the bandwidth to users. Legal torrent streaming might have just got the boost it requires.


This isn't a new feature, it was available at least in 2012, possibly earlier.


It seems HN is being trolled by posts from the past.


I read about it in 2010.


I used it in '08-09.


Spotify already did that early on - the client was also a P2P node that would transfer music files from and to other clients. They shut that part down last year.


When I looked into it a few years ago, Spotify used a custom protocol which was similar to BitTorrent but was not actually using BitTorrent. It had some clever features to optimize for the chunk of the song to be played and it had encryption for the DRM.


I think BBC iPlayer used BitTorrent (or some other form of P2P) when it first launched.


Could be cool to get better speeds, if they have multiple seeders for the bucket.


Honestly, I'm surprised this is built into S3, rather than CloudFront. Ideally, you want your object replicated to a bunch of edge nodes, which then all simultaneously join into a single swarm as seeds.


It predates CloudFront.


Pretty sure it was part of the initial, day one S3 release.


I would like to see that too.


I once read an article or email about why web browsers can't/won't implement BitTorrent clients -- one of the biggest barriers IMO to making BitTorrent more ubiquitous for file download/upload is the fact that I can't just click on a link and have my browser handle the rest, like I can with an HTTP/FTP link.

Does anyone have a link that explains why browsers won't do this, or at least a brief explanation?


Opera used to do this, so there's no particular reason specific to BitTorrent that the others do not. Probably the usual questions of priority and bloat.

http://help.opera.com/Windows/9.00/en/bittorrent.html


Apart from myself, anyone else hoping to see downloading straight into an S3 bucket? (Seeding from a bucket has been around for some time, you could imagine my disappointment when clicking the link)


It still needs BitTorrent client. How about doing exactly the same but using WebRTC which would work without additional client application.


Uh, this feature has been available for years, hardly "news".


I'm dreaming about using IPFS on S3. Will it be possible?


awesome




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: