I've been recently tasked to find a live video solution for an industrial device. In my case, I want to display video from a camera to a local LCD and simultaneously allow it to be live streamed over the web. By web, I mean that the most likely location of the client is on the same LAN, but this is not guaranteed. I figured this has to be a completely solved problem by now.
Anyway, so I've tried many of the recent protocols. I was really hoping that HLS would work, because it's so simple. For example, I can use the gstreamer "hlssink" to generate the files, and basically deliver video with a one-line shell script and any webserver. But the 7 second best case latency is unacceptable. I really want 1 second or better.
I looked at MPEG-DASH: it seems equivalent to HLS. Why would I use it when all of the MPEG-DASH examples fall back on HLS?
I looked at WebRTC, but I'm too nervous the build a product around the few sample client/server code bases I can find on github. They are not fully baked, and then I'm really depending on a non-standard solution.
I looked a Flash: but of course it's not desirable to use it these days.
So the solution that works for me happens to be the oldest: Motion JPEG, where I have to give up on using a good video compression (MPEG). I get below 1 second latency, and no coding (use ffmpeg + ffserver). Luckily Internet Explorer is dead enough that I don't have to worry about its non-support of it. It works everywhere else, including Microsoft-Edge. MJPEG is not great in that the latency can be higher if the client can't keep up. I think WebRTC is likely better here.
Conclusion: here we are in 2019 and the best low latency video delivery protocol is from the mid-90s. It's nuts. I'm open to suggestions in case I've missed anything.
A fairly long time ago (3-4) years I was tasked to do something fairly similar (though running on Android as the end client). HLS was one of the better options but came at the same costs you describe here. However it was fairly easy to reduce the block size to be less to favor response vs resilience. Essentially you trade buffer size and bitrate switching quality for more precise scrolling through the video and faster start times.
I had to hack it quite severely to get fast load with fair resilience for my usecase as the devices are restricted in performance and can have fairly low bandwidth. Since you're looking at a relatively fast connection, simply reducing the chunk size should get you to the target.
As a follow up - I've spent a couple years working on a video product based on WebRTC. This either works for a PoC where you just hack things together or on a large scale where you have time and resources to fight odd bugs and work through a spectrum of logistical hoops in setting it up. So unless you plan to have a large-ish deployment with people taking care of it I would stick to HLS or other simpler protocols.
> I looked a Flash: but of course it's not desirable to use it these days.
RTMP protocol has a lot of implementations and is still widely used for the backend part of transmitting video at a low latency (i.e. from the recorder to the server).
RTSP with or without interleaved stream is another option.
DASH/HLS is a solution for worldwide CDN delivery and browser based rendering. Poorly suited for low latency.
If you need low latency and browser based rendering you need something custom.
You can also consider tunneling over WebSockets. It's a lot easier than WebRTC especially you don't need the handshaking nonsense which often require self hosting STUN and TURN servers if you don't want to rely on third parties. IIRC the performance of WebSockets is good enough for companies like Xoom.
You should probably try mixer. They rolled a low latency protocol by their own. It use websocket as a bilateral channel to allow the server push whatever it want to client directly. Achieving sub second delay (The model here looks more like webrtc instead of hls though)
I have no idea what the underlying tech is, but Steam Link can do extremely low latency on the same network and very low latency over the internet. It can also stream non-game applications, though I imagine automating steam is a nightmare.
Me and my friends have our own little streaming website and manage to get 1~2 seconds delay.... It's nothing fancy, NGINX with the RTMP plugin from which we get the streams, it only passes them trough, once we added encoding we had a noticeable delay. This is flash tech that can be run as html 5 now, but I didn't see this within your list so perhaps you haven't looked at it.
Interesting, I had tried to get an HTML5 video element to read from a gstreamer-based MPEG source, but it would not work. I'm pretty sure because gstreamer did not provide a real HTTP server, so the headers were messed up. It's odd, because oggmux did work over tcpserversink. Anyway, I will try this because I'm interested in the resulting latency.
Keep in mind that NDI is a proprietary technology from NewTek, not an open spec like SMPTE 2110/2022. That being said it does work remarkably well in my experience, provided you have a dedicated network for it.
Similar situation here, ended up with the same solution, after an initial attempt with HLS. jsmpeg (https://github.com/phoboslab/jsmpeg) made it pretty easy.
Try streaming TS packets over Websockets and decoding with FFMPEG compiled to WASM in the browser. I wrote, https://github.com/colek42/streamingDemo a couple years back, and despite the hacky code it worked really well. You could probably do much better today.
We recently completed a project with similar requirements. We ended up using rtsp from the camera and packing it up in websockets using ffmpeg. We had sub second latency. The camera gave h264 so we could just repack that.
We're giving a talk about the project on MonteVIDEO Tech meetup, though it will be in Spanish.
Well I was hoping to not have to use a commercial product. From the front page, "Ultra Low Latency WebRTC" is supported only in the Enterprise Edition. I may as well use Flash.
"8-12 seconds End-to-End Latency" for community edition.
Actually a commercial product is not necessarily a problem, but the monthly fees are. If there was a one time fee version (perhaps with limited number of clients or something), then this might work.
I used jsmpeg to live stream camera feeds from robots. There are a few others that do the same. In my case I wrote a custom go server to handle the multiplexing. It did fairly well, and was able to support something like 60 clients at a time. This was a weekend project and I don't have time to keep the reobots on line so I will leave you with some video of a early client I build. There are some other videos showing the robots off on my channel.
I also poked a round with making a real time remote desktop client that could be access via a web browser for linux. It to -- at least on local lans got very low latency video. The link for that is below too.
Edit: I should mention latency were measured in ms, not seconds, even for many clients. I am sure to scale out to 1000's of users I would have to add a bit, of latency but not by much.
Oh yeah, I saw that. I'm also hoping to be able to use the h.264 compression hardware built into the SoC we're using and it was my understanding that jsmpeg was MPEG1 only.
That being said, the ffmpeg solution is not using the hardware accelerator either, even though it does support MJPEG. But I think with work we can get a gstreamer based solution: the missing part is an equivalent of ffserver that works with gstreamer. The hardware vendors like to provide gstreamer plug-ins for their accelerators.
Also, it's weird to me that this needs a giant javascript client library. What about the HTML5 built-in video support?
If you are using mpeg1 you can just dump the packets on the line. And if you want to get fancy you can read in a HQ stream and setup a beefy server to run 3 or 4 conversions to different bandwith classes and move clients up and down as required.
My code is geared to robots -- and has not been updated recently but there is at least a example of the simpler multiplexing in go.
Their own variation of HLS. Note that except for Safari, browsers don't implement HLS directly, but rather websites do, through HLS.js etc. So you can implement whatever low latency version of HLS you want (assuming it is constructed of HTTP primitives that JS can access).
The major criticism the author has is the requirement for HTTP2 push for ALHLS, which many CDNs don't support. While I agree it is a valid criticism, I am glad Apple is forcing the CDNs to support push. Without the 800lb gorilla pushing everyone to upgrade, we would still be using pens on touchscreens.
I am not a fan when Apple obsoletes features that people love and use. But I always support when Apple forces everyone to upgrade because friction from existing providers is what keeps things slow and old. Once Apple requires X, everyone just sighs and updates their code, and 12mo later, we are better off for it.
That being said, I agree with author's disappointment that Apple mostly ignored LHLS instead of building upon it. Chunked encoding does sound better.
There are good reasons CDNs don't support http/2 push. It’s hard to load balance and hard to operate, since it requires a persistent TCP connection with the client for the entire duration of the stream, which can be hours. It has consequences that echo almost everywhere in the architecture.
They are likely to “solve” them by charging more. Serving low-latency HLS to Apple devices will cost more, continuing consolidation into the few tech giants big enough to pay what it takes to get inside the iOS walls. Hardly progress.
What exactly is the benefit of HTTP2 for HLS CDN use, particularly?
The obvious benefit of not using it is that you don't need your CDN to do TLS, likely to be utterly superfluous if video chunks are validated through the secure side-channel of the playlist already.
TLS specifically does not prevent a passive eavesdropper from telling what compressed video you’re watching. If they can drop a few packets and force you to rebuffer, they can tell very quickly—plausibly faster than you can tell watching the video start!
There's variation in the segment-to-segment sizes of video. Watching the stream of data, you can pretty easily find many of the segment sizes, and from there you just need a lookup table.
Figuring out spoken language or which web pages are in packets is fuzzier but still viable.
The main redeeming feature of traditional HLS is that it can use ordinary HTTP CDN infrastructure. If you're going to require video-streaming-specific functionality in CDNs anyway there is absolutely no justification for designing your protocol in this horrendously convoluted, inefficient, poorly-performing way.
It's ironic that "live streaming" has gotten worse since it was invented in the 1930's. Up until TV went digital, the delay on analog TV was just the speed of light transmission time plus a little bit for broadcasting equipment. It was so small it was imperceptible. If you had a portable TV at the live event, you just heard a slight echo.
Now the best we can do is over 1 second, and closer to 3 seconds for something like satellite TV, where everything is in control of the broadcaster from end to end.
I suppose this is the tradeoff we make for using more generalized equipment that has much broader worldwide access than analog TV.
Unless your content operates in a very small niche, "real time" is far less important that continuity.
In rough order of preference for the consumer:
1) it starts fast
2) it never stops playing
3) It looks colourful
4) Good quality sound
5) Good quality picture
10) latency
one of the main reason why "live" broadcast over digital TV has a stock latency of >1 second is FEC. (forward error correction) this allows a continuous stream of high quality over a noisy transport mechanism. (yes, there is the local operating rules for indecent behaviour, and switch and effects delays, which account for 10 seconds and >250ms respectively)
For IPtv its buffer. Having a stuttering stream will cause your consumers to switch off/go elsewhere. One of the reasons why realplayer held on for so long was that it was the only system that could dynamically switch bitrates seamless, and reliably.
There is a reason why netflix et al start off with a low quality stream, and then switchout to HD 30 seconds in, its that people want to watch it now, with no interruption. They have millions of data points to back that up.
Google seems to think they can implement video gaming over IP. And they probably can, my ping to them is only 9ms, less than a frame.
There is just a broad lack of interest in reducing latency past a certain point unless there is a business reason for it. People don't notice 1 second of latency.
And yet, I was able to game competitively from my apartment in Brisbane, using a server in Sydney, using Parsec; usually coming in at less than a frame of latency, sometimes just over a frame. This was two years ago, too. And Australia isn't known for it's amazing internet connections (though mine was better than most).
Just because one group was incompetent doesn't mean another will be.
It has been possible for years to get a total encode+decode latency of less than one frame with x264.
Meanwhile many people are gaming on TVs that impose 3-8 frames of processing lag.
And you can beat most current tech by more than half a frame just by supporting HDMI 2.1 or variable refresh rate. (Instead of taking 1/60 of a second to send a frame, you send it as fast as the cable can support, which is 5-12x faster)
I played over 20 hours of assassins creed through chrome during the stadia beta and I couldn't notice any latency. While it might not work for games like cs go, AR, or bad networks, they 100% have a working product today for assassins creed.
It's not surprising if you think about how our ability to store video has changed over the years. The delay on analog TV is so low because the picture data had to go straight from the camera to the screen with basically no buffering since it was infeasible to store that much data. (PAL televisions buffered the previous scanline in an analog delay line for colour decoding purposes, but that was pretty much cutting edge at the time.) Now that we can buffer multiple frames cheaply, that makes it feasible to compress video and transmit it without the kind of dedicated, high-bandwidth, low-latency links required in the analog days. Which in turn makes it possible to choose from more than a handful of channels.
No, it got worse. Try H265, compression artifacts are pretty bad in certain scenarios, even with high bitrate. Same with h264 - but it can be solved with high bitrate - but your file size also gets much much bigger. Which means you will need very low latency, high-speed internet.
I think youtube is the only streaming service that does it very well without any issues for the end-user, anywhere in the world. Mostly because of their free peering service that is extremely ubiquitous. https://peering.google.com/#/
Some delay from many producers is almost certainly intentional. Live content providers want to be able to have a second to cut a stream if something unexpected (profanity, nudity, injury...) occurs on set.
Analog TV is also massively less spectrum efficient. You can fit 4+ digital channels in the same spectrum as one analog TV channel.
And don't forget how low and inconsistent the quality of analog TV was compared to what we can broadcast digitally.
The real story here is that latency isn't actually important to live TV, so it's a no-brainer trade-off to make. If you look at other transmission technologies where latency is more important, like cellular data transmission, latency has only decreased over the years.
Thanks, that's exactly how I felt — that there’s a really good and useful article in here, but clouded by assumptions and an attempt to create controversy.
> A Partial Segment must be completely available for download at the full speed of the link to the client at the time it is added to the playlist.
So with this, you can not have a manifest file that point to next future chunks (e.g. for up to next 24 hours of live stream) and delay proccessing of http request until the chunk became available. Like HTTP Long Polling used for chunks.
> On the surface, LHLS maintains the traditional HLS paradigm, polling for playlist updates, and then grabbing segments, however, because of the ability to stream a segment back as it's being encoded, you actually don’t have to reload the playlist that often, while in ALHLS, you’ll still be polling the playlist many times a second looking for new parts to be available, even if they’re then pushed to you off the back of the manifest request.
Which could be avoided if Apple didn't enforced the availibilty of download "at the full speed" once it appeared in the manifest. (long polling of chunks)
LHLS doesn't have this issue as the manifest file itself is streamed with chunked responses hence it makes sense. (streaming manifest file)
> For the time being at least, you’ll have to get your application (and thus your low latency implementation) tested by Apple to get into the app store, signaled by using a special identifier in your application’s manifest.
And this makes me to think about the implementability of the 1st and 2nd point on ALHLS. Maybe the current "implementation" is compatible but not with the specs itself.
It is not an IETF standard - those have RFC numbers. It is just a personal draft - any IETF member can upload one of those, regardless if it's useful. I'm happy that it has a specification - but it's just a one-man project, not something that has gone through the IETF standards process.
> measuring the performance of a blocking playlist fetch along with a segment load doesn’t give you an accurate measurement, and you can’t use your playlist download performance as a proxy.
I don’t see why this would be the case. If you measure from the time the last bit of the playlist is returned to the last bit of the video segment is pushed to the client, you’ll be able to estimate bandwidth accurately.
> from the time the last bit of the playlist is returned to the last bit of the video segment
Based on my loose understanding of HTTP/2 server push and ALHLS, the sequence of events will be:
1. Client requests playlist for future media segment/"Part"
2. Server blocks (does not send response) until the segment is available
3. Server sends the playlist ("manifest") as the response body along with a push promise for the segment itself
The push then begins with the segment.
The push stream can presumably occur concurrently with the response body stream. So I don't think you can wait until every bit of the playlist comes in. Likewise, you can't use the playlist bits itself to gauge bandwidth because the server imposes latency by blocking.
As usual, Apple pushes NIH, instead of supporting DASH which is the common standard. And they also tried to sabotage adoption of the later by refusing to support MSE on the client side that's needed for handling DASH.
> As usual, Apple pushes NIH, instead of supporting DASH which is the common standard.
I mean... HLS predates DASH. It would've been hard for them to support a common standard which didn't even exist at the time. Initial release of HLS was in 2009[0], work started on DASH in 2010[1].
I'd also disagree with the characterization of DASH as "the commmon standard" - it's certainly a legitimate standard, but I feel like support for HLS is more ubiquitous than support for DASH (please correct me if I'm wrong).
Predating doesn't stop them from supporting something else once it becomes common. They don't do it since they want to impose HLS on others. And their refusal to support MSE[1] on iOS stinks even more clearly as an anti-competitive method to do it.
Apple isn't exactly the champion of free standards. Is HLS free for others to adopt? DASH is. The same messed up story happened with touch events for JavaScript. What Apple were pushing wasn't free.
That's MPEG-LA patent trolls, nothing to do with MPEG in MPEG-DASH for the reference. They can claim they own the Moon the same way. They do it with anything that looks usable and related to video. Patent trolls aren't really the measure of how free the standard is.
Apple however are the owners of HLS, and unlike some random patent trolls, if they are insisting on its adoption, they have to make sure their patents on it are royalty free. Not that it will protect anyone from further patent tolls attacks from the likes of MPEG-LA, but that's a requirement.
Apple has never been shy about communicating which technologies of theirs they have patented. While it would be nice to have an explicit statement from Apple, the absence of it speaks volumes.
Has MPEG made any statement about the MPEG-LA patent pool?
> While it would be nice to have an explicit statement from Apple, the absence of it speaks volumes.
I'm not sure what that means. Either they released it royalty free or not. Since there is no public statement about it, there is no reason to assume it's free.
> Has MPEG made any statement about the MPEG-LA patent pool?
I don't think anyone cares to make statements about patent trolls. The only effective way to deal with them is to bust their claims in court, which not many want to do.
Anyway, so I've tried many of the recent protocols. I was really hoping that HLS would work, because it's so simple. For example, I can use the gstreamer "hlssink" to generate the files, and basically deliver video with a one-line shell script and any webserver. But the 7 second best case latency is unacceptable. I really want 1 second or better.
I looked at MPEG-DASH: it seems equivalent to HLS. Why would I use it when all of the MPEG-DASH examples fall back on HLS?
I looked at WebRTC, but I'm too nervous the build a product around the few sample client/server code bases I can find on github. They are not fully baked, and then I'm really depending on a non-standard solution.
I looked a Flash: but of course it's not desirable to use it these days.
So the solution that works for me happens to be the oldest: Motion JPEG, where I have to give up on using a good video compression (MPEG). I get below 1 second latency, and no coding (use ffmpeg + ffserver). Luckily Internet Explorer is dead enough that I don't have to worry about its non-support of it. It works everywhere else, including Microsoft-Edge. MJPEG is not great in that the latency can be higher if the client can't keep up. I think WebRTC is likely better here.
Conclusion: here we are in 2019 and the best low latency video delivery protocol is from the mid-90s. It's nuts. I'm open to suggestions in case I've missed anything.