This is a really nice overview of the nuances of WebRTC.
If you want to try this tech at a higher level, I maintain an experimental hobby project that attempts to abstract all the complicated parts of WebRTC into a simple API. Most importantly, it obviates the need to run your own signaling server by piggybacking on various public protocols like torrent trackers or IPFS to match peers. It's well-suited for quick projects/prototypes.
Trystero is magic. I am baffled that it doesn’t have thousands of stars on GitHub. I’ve used it in several projects and it Just Works. Thank you for the incredible library dmotz!
I haven't found a library that makes me feel comfortable with using WebRTC for random little projects. Setting up TURN/STUN is annoying and I've actually never successfully setup a server/client model (it seems like it'd be useful to have the fast UDP-like stuff, despite WebRTC mostly being about client/client communication)
> despite WebRTC mostly being about client/client communication
This is actually kind of a misconception, though it’s an understandable one given that WebRTC is almost always pitched as a peer-to-peer protocol.
In practice, most people using WebRTC for video are sending their video to a server, not directly to another client. It’s pretty safe to assume that most people who use your app are going to need TURN, and at that point, you’re not really doing peer-to-peer at all, so you might as well just have your browser-based app talk to a server that’s pretending to be another browser.
These servers (called Selective Forwarding Units or SFUs) can operate like a TURN server in the case of a one-on-one call, but they can also multiplex everyone’s feeds in the case of a larger conference (peer-to-peer 5 person calls would require each participant to send 4 copies of their video) and often have extra features like the ability to record calls, transcode streams or convert to other protocols.
The one I’ve used a lot is called Janus[0], it’s open source and has good docs, I recommend people check it out if they’re interested in getting deeper into WebRTC or other video streaming tech.
Another open-source SFU I've had great experience with is Livekit[0]. Great docs, modern, easy to deploy (it's a golang binary), and supports a number of egress options too if you want to record the media during a stream to an external system. With their cloud product they've also built a really cool 'mesh-based' SFU-CDN that allows peers to connect to an SFU closest to them and have their media broadcast to other SFUs internal to their network, which allows for easy scaling during broadcast-style use-cases.
I don’t remember where I read this (think it was some published paper) but I was building some audio streaming thing on top of WebRTC and there was an estimate that 60% of people would be able to do p2p.
I can definitely second janus. At previous job, we used that for video calls/streaming mixed with FFMPEG for some transformations along the way. Really reliable stuff.
Haven't used it yet, but if this library built on Cloudflare Workers can do for webrtc what PartyKit.io does for websockets (again via Cloudflare Workers), then I expect it'll be a pleasure to use :)
> and will not be talking about any software in particular
If you want to do something with WebRTC, definitely use libwebrtc, and definitely get acquainted with the field trials, because that's how Google sneaks in all their best functionality.
If you want to do servers or embedded you shouldn’t. It makes sense to use in the places it was designed for though! If you are shipping a client and have needs that match its creators it’s the best choice.
What features of libwebrtc in particular would you like to see in other implementations. I am very excited to see a future with [0] on the client side.
I tried reading that a few times but don't really understand it. Can you put it in terms of what I can't do without it, or can do with it? (I do have some long-reaching webrtc background but its pretty surface level and increasingly out-dated)
I work on str0m along my colleagues at Lookback. We use it it to build something akin to a SFU(Selective Forward Unit) i.e. a services that forwards media between many clients in a conference setting. There are other people using it for the same purpose. I think there are some folks trying it out for client-to-client use cases, the thing that's different about that is that more is required of the ICE agent to find a network path between the clients.
str0m doesn't deal with encoding or decoding media at all, but given that you provided those capabilities it should be possible to use it in e.g. a mobile app setting as part of a client implementation. This use case most likely has a few rough edges at the moment though.
If you, or anyone else, has further questions don't hesitate to join the Zulip chatroom and ask away.
The problem with libwebrtc is that it is really built to decode video frames being received from the network.
Also the APIs are intended to accept full video frames to be encoded before being sent to the network.
This is great for a browser!
Not so much for an SFU or application that needs to work at the elementary stream (not uncompressed frame) level.
An interesting take for sure. Having worked with libwebrtc on multiple occasions I've found it pretty hard to separate from the Chrome build toolchain. Especially if you're doing server work there are pure Go and pure Rust implementations that work great.
How hard is it to implement a WebRTC peer from scratch (from socket calls)? If I don't want to support all profiles, let's say just unrelayed unencrypted data streams?
Is it "weekend project" or is it "reading the RFCs alone will take weeks -project"?
I think you can do it in a weekend, maybe two! You are going to implement the simplest parts. This is what it would look like. It also depends on what libraries you have available.
# Accepting an Offer
Parse the Session Description (remote's Offer). Grab the relevant values (ice username+password and certificate fingerprint)
# Generate your local state
Listen on a random UDP port for ICE. Generate a local ICE Username+Password. Generate a certificate for DTLS.
# Generate your Answer
Create an Answer that includes your IP, Port, ICE Username, ICE Password and Certificate Fingerprint.
# Connecting over ICE
Process the inbound STUN packets. Authenticate them with the remote ICE Username+Password. Respond with your local Username and Password.
# DTLS Handshake
Perform a DTLS handshake over the connection established via ICE. Make sure the remote certificate matches the one in the Offer.
# SCTP Association
Over DTLS create a SCTP Association. You now can exchange data streams!
-----
If you have a SDP, ICE, DTLS and SCTP library this should be ~250 lines of code. If you are missing any of those you will have to do a little more work. SDP and ICE could be done in a weekend. SCTP could be done in two weekends. DTLS I found a lot harder!
All of these estimates are for the simplest implementation. You could have years of work ahead of you depending on how far you want to take it :)
I still have PTSD from Chrome blocking sleep on my Mac. Checking the power management status it was always "WebRTC has active peer connections". The only solution was to turn it off completely in Chrome's settings.
I don't know if it was a brain dead decision at Google or in the design of this protocol, but I'm associating it with lack of respect for the user.
What about WebRTC do you find hard? I can connect two browsers and start a video call in ~50 lines of JS.
As you need more features/knobs it does get harder. I think that is just a reflection of how difficult the problem is of real-time communication though.
video conferencing seems to be the most wanted app, it involves too many parts to set up an sfu, hard to make it work with all the most popular devices , and i m not sure you can control the exact video resolution/bitrate. Mediasoup seems to have the most complete server app, but still making the whole thing work right is a lot of work. It seems like video meeting should be easy to use by now, it was easier when Flash was around.
> i m not sure you can control the exact video resolution/bitrate
You are right WebRTC doesn't give you control of this. It is on purpose though. If a webpage could push more bitrate then the link supported it would cause congestion collapse/be used for DoS attacks. You can request resolution/bitrate, but it will be best effort.
> it was easier when Flash was around
I didn't really use flash (was on Linux) so I can't speak to that. It was always buggy/hard to setup. So as a Linux user though I am super grateful for WebRTC!
The only thing I know about WebRTC is that I need to "Prevent WebRTC from leaking local IP addresses" in uBlock. This extremely limited experience has made WebRTC in my mind into a privacy risk. So thanks for this.
In fact, I am on the hunt for a minimal C library to do exactly this for a hobby project and nbnet is the best candidate I have found so far. The only drawback so far is that I would love to have voice support and nbnet (rightfully so to keep it simple) does not provide it and I would have to roll it on my own.
Really? WebRTC DataChannels seem like they could fulfill your needs. What is it that prevents using something like WebRTC Data Channels and forces you to use TCP based connections?
So when I looked into it last month - its true DataChannels would be the answer. But I couldn't find any "nice" abstraction in the language i'm using for it. like I just can't get some standalone WebRTCDataChannels abstraction - many WebRTC libraries don't have implementation for this or its all very specialized for audio transfer. There is a route of using some standalone WebRTC gateway service that your application can send messages to (I think this is the purpose of Janus?) - blah blah point is: its not easily done. Not in Elixir so far anyways.
And the second part of it is - I'm sure its easily done but at this point there is some incentive to keep whatever implementation you have to yourself.
Or maybe I missed something and I should try looking into it again.
Agreed. I really looked into if WebRTC would be possible for the game I'm trying to make but currently its not as easy as people make it out to be. fortunately - the communication channel is easily abstracted, so I wait for the day i finally get the abstraction i want.
If you want to try this tech at a higher level, I maintain an experimental hobby project that attempts to abstract all the complicated parts of WebRTC into a simple API. Most importantly, it obviates the need to run your own signaling server by piggybacking on various public protocols like torrent trackers or IPFS to match peers. It's well-suited for quick projects/prototypes.
https://github.com/dmotz/trystero/