Interesting article. I have a few comments/questions:
HLS support on Android is incredibly spotty. Do you mean in-app HLS support? The Android Chrome browser doesn't support it at all, and only a select few versions of the regular Android browser support it. Then again, there's not a great alternative.
Why offer the multiple bitrates on the HLS stream at all? You know what the client's bandwidth is, why perform three separate live transcodes? Is this because you're delivering the stream to an iOS app and Apple requires that? In Safari or a UIWebkitView based app you probably wouldn't have to do that... Or is it to compensate for potential bandwidth fluctuations?
You mentioned the mpeg-ts segmenter in more recent versions of ffmpeg but also mentioned that it has unacceptably high latency. I have not found this to be the case so long as you set the segment times individually and force new keyframes at given times (-force_key_frames and -segment_times flags, otherwise the segmenter ignores your segment time option and just creates TS chunks at whatever frequency it wants to).
Pre-transcoding the first few seconds is a really great approach that I had never considered before. Very cool.
I assume you mean spotty in Android pre-4.0? I'd worked on HLS streaming for Android recently and didn't have issues with ICS and later. But then again with Android you can't possibly test on all OS/device combinations, so would love to know if there are things I should be watching out for.
> Why offer the multiple bitrates on the HLS stream at all? You know what the client's bandwidth is
If you're watching video over a cellular network, it's very difficult to use a single bandwidth since it can fluctuate a lot. So the best practice (as dropbox is doing) is to offer multiple streams at different bandwidths to the client, and allow the client to pick the best one for the following segment based on current network conditions.
Even post 4.0 it isn't super reliable. Basic playback of HLS works, but we have had issue seeking, pausing for long periods of time and resuming, etc. I wouldn't be surprised if things like WebVTT(Subtitles), ID3 tags and other advances features also don't work well.
4.4 is supposed to be much better but I don't work on Android anymore, so I am not really sure.
Hello there, Pierpaolo from Dropbox here. Good questions!
HLS support on Android is quite spotty... yeah... We try our best to build a solution that works as ubiquitously as possible and indeed it gets challenging on Android. Given our engineering resources, we need to pick our battles so a good solution (even if not perfect) is better than no solution. Anecdotally, our code has quite a few conditional statements to deal with Android. For instance, at some point we dag into the code and found that certain useful tags like the EXT-X-PLAYLIST-TYPE (see http://tools.ietf.org/html/draft-pantos-http-live-streaming-...) are simply ignored. As other folks commented already, we did find the support to be much more stable on ICS and above.
Multiple bitrates is a must for a few of reasons. 1) Apple requires a minimum supported rate of 64Kbps, 2) networks are really still incredibly spotty and providing multiple representations enables the player to act intelligently and switch according to the instantaneously measured throughput of the channel. 3) providing a high quality gratifies the users that are previewing content on a fast wifi network.
Good point on the segmenter built in ffmpeg. I felt like this part of the explanation was a bit too technical for a general audience but I can comment here. There are several reasons why we went with our own solution. Some are because of the way our pipeline works and the fact that with our tool we can create segments with variable lengths to minimize the startup latency. Specifically, having shorter segments at the beginning of the video allows the player to download less data before starting playout. I want to warn the reader to be careful if you want to experiment this path because the standard poses some constraints on how fast you can change the segment duration. So far, our approach seems to work fine so we are happy with that. Also, at the time we started development we were on ffmpeg 0.8.x transitioning on 1.x and the builtin segmenter support was not great. As we move forward, we are likely to reconsider the builtin segmenter.
As far as I know the only absolute requirement is to have at least a single 64K stream. All other bitrates come as general recommendations, but in my experience, offering a few different bitrates amount to far better, smoother playback, overall.
"Warning: These requirements apply to iOS apps submitted for distribution in the App Store for use on Apple products. Non-compliant apps may be rejected or removed, at the discretion of Apple."
Interesting. I recently implemented a very similar transcoding server[1] for StreamToMe[2]. I used node and was pleasantly surprised at how easy it was to parse and segment MPEG Transport Streams[3]. It's good to know I was on the right track...
I was under the impression that Dropbox can't "see" your files in the sense that they wouldn't be able to tell what was inside a document or photo; it's all supposed to be encrypted. Given this new information, it appears that not only can they see what you have uploaded, they can actively process, encode, decode, etc.
Not exactly what I'm looking for in an online file storage company. I realize they've never outright said "we're 100% secure end to end" but they've indicated in the past that they have no interest in looking at their users' files. Now they are showing interest. Count me out.
> I was under the impression that Dropbox can't "see" your files in the sense that they wouldn't be able to tell what was inside a document or photo; it's all supposed to be encrypted.
No, that was never true. In particular, that would break Dropbox's model for deduplication, as well as their web clients, and various other things. If you're looking for an end-to-end encrypted solution, Dropbox was never it.
To expand on that, mega.co.nz does attempt to do encryption end deduplication. Each file uploaded is encrypted with it's own hash, and the hash is stored in the users encrypted account. The idea is that if two users upload the same file they will have the same hash and be deduplicated. It's not particularly secure or private in their implementation, but it can certainly be done.
Unless your online storage method of choice is encrypting your files on the client side before its transmitted, you should assume it's visible to the operators and anyone they grant access to.
And it's not just Dropbox that can see those files but also Amazon as Dropbox is doing the encoding of the videos on AWS.
I don't think Dropbox has ever claimed that they can't see your files on a technical basis. I'm sure that there are policies and auditing, but it's not secure storage. This is why I've told medical profession clients that they should absolutely NOT be using Dropbox.
If you want security, encrypt on your own or decide if you trust SpiderOak.
If you want more privacy you can try
http://www.nimbusvid.com which uses mega.co.nz plays encrypted video in the browser. Content is never sent to nimbusvid.com so you can watch your videos in privacy.
Indeed, we do try to extract the rotation flags from the original and apply them when we stream the video. One possible issue is that different devices seem to apply different criteria to determine the rotation. For instance, the iPhone seems to detect the rotation only at the beginning of the video capture so if you rotate your device after you started capturing, you end up with content that has a messed up rotation.
Having said that, there is a very good chance that you are simply hitting some bug that I'd love to fix :) If you are willing to share the problematic video, you can hit me up at pierpaolo@dropbox.com and we can go from there.
we are considering open sourcing the segmenter tool (3rd). As you can imagine it's very much tailored for our pipeline and as the internal tool built in ffmpeg matures, we think our solution will become of marginal value. I keep debating myself if/when to bite the bullet and try out the ffmpeg one.
The first tool is quite similar to the one that mau posted (https://github.com/danielgtaylor/qtfaststart). Given that we forked quite some time back, I'd suggest to start from that one since it has probably bug fixes on top of our version.
libfaac is a safe aac default, but aacplus works MUCH better at the the 8 - 16 kbps range, which seams to be a stated goal of this page. It turns on SBR+PS and has ffmpeg support.
If any Dropbox people are reading this, can you comment on what criteria you used to pick the audio codec?
You correctly guessed that we picked libfaac as a safe option for compatibility. I never tried out aacplus and sounds indeed interesting so we'll likely try it out soonish. Our target rate is a bit higher than those you mention though. For audio, we target 32kbps at low quality layers and 96kbps at higher qualities. Couple of questions for you:
- how does aacplus works at these rates?
- what compatibility issues can we expect if we were to try it out?
HLS support on Android is incredibly spotty. Do you mean in-app HLS support? The Android Chrome browser doesn't support it at all, and only a select few versions of the regular Android browser support it. Then again, there's not a great alternative.
Why offer the multiple bitrates on the HLS stream at all? You know what the client's bandwidth is, why perform three separate live transcodes? Is this because you're delivering the stream to an iOS app and Apple requires that? In Safari or a UIWebkitView based app you probably wouldn't have to do that... Or is it to compensate for potential bandwidth fluctuations?
You mentioned the mpeg-ts segmenter in more recent versions of ffmpeg but also mentioned that it has unacceptably high latency. I have not found this to be the case so long as you set the segment times individually and force new keyframes at given times (-force_key_frames and -segment_times flags, otherwise the segmenter ignores your segment time option and just creates TS chunks at whatever frequency it wants to).
Pre-transcoding the first few seconds is a really great approach that I had never considered before. Very cool.