Having implemented the entire WebRTC spec a couple times now, what wisdoms have you learned that can be shared with us mortals?
If you could write a protocol for the same use-case right now that instantly became as popular and standard as WebRTC, what would be different about it?
I am so glad you like them! Working on Pion and the KVS WebRTC stuff has been super rewarding because people want it. I have built so many things that flopped, so this is a welcome change of pace. Thank you so much for being part of the community, talking to people on the Pion slack about WebRTC is the highpoint of my week :)
> Having implemented the entire WebRTC spec a couple times now, what wisdoms have you learned that can be shared with us mortals?
The biggest thing I learned during all this was community/collaboration. With all the RFCs I could grind it out and learn them/write unit tests, but working with people took a lot more. With Pion I wanted to build a community/get people excited. I don't always get it right, but I love when I can figure out where mine and other peoples interests lie.
The next big one is convincing people that WebRTC/P2P is important. It has been really hard to convince the Go community that this stuff is important, I have gotten LOTS of talk rejections. I feel so strongly that WebRTC/P2P is the future I want to see. I just want to see an internet that is resilient to monoliths going down and encourages direct collaboration between people.
> If you could write a protocol for the same use-case right now that instantly became as popular and standard as WebRTC, what would be different about it?
I think WebRTC as a protocol is really wonderful. I love that it took established/working technologies and wrapped them up into a nice package. This has let me bridge WebRTC with lots of other interesting projects/hacks. I think that is why WebRTC will eventually eat the RTC/VoIP world, since it is a standard and uses existing technologies it works everywhere.
I would like to see lots of changes to WebRTC as an API though. The W3C/Javascript protocol doesn't allow you access to enough things, I think the more things developers can tweak the more interesting things we will see built. This was a slide I did for the W3C https://i.imgur.com/vHTnlFY.png it hasn't gotten much reception yet, but I am going to keep trying :)
> Media via Datachannels is being chosen more and more (can be HW accelerated)
That piqued my curiosity. Could you please expand on this? What kind of hw accel can be applied to video when it is sent through DataChannels instead of the standard video channel?
The encryption for DataChannels is HW accelerated while video is not. I was able to convince people to change this one! So a developer can opt-in to the accelerated profile.
Dude, thank you for this. My team has been trying to figure this out - we know we will eventually need a way to utilize in a low-level context, but couldn't dig up any cases of such an implementation. In addition, we were constantly frustrated by the lack of HW acceleration that created a performance bottleneck. Great work!
Nice speedup! However it seems that AES-GCM comes at a cost of considerable increases in packet sizes, from what I could read in the link. Have you observed an increase of packet loss and latencies in real-world networks? (esp. with third world connections)
The improvements in CPU usage are clear for the server, but the choice doesn't seem so obvious to me
EDIT: I meant that, for the quality of end-user experience (who doesn't care about server load), it doesn't seem like an immediately obvious better choice to just use GCM, until some empiric testing shows if the potential increase of packet loss causes noticeable problems, or on the contrary, doesn't. It would be very cool reading any observations in this regard.
Sorry I haven't done any measurements myself. Agree, it is a trade-off for sure, if you are doing video and packets are already ~1300 bytes it probably makes sense. For Opus only you are really blowing up your payload.
You might see CPU improvements on the client side as well though! If you have a fixed set of clients it might be something worth evaluating.
I personally don't have any real-word deploys I can measure. The AES-GCM work came around because a Pion user mentioned it me, so then I added it/tried to push the issue in the IETF/W3C.
I agree it needs more low level controls. The api is very high level at this moment. Luckily that seems to be a focus of webrtc NG being discussed.
I also think SDP wasn't such a great choice. ORTC that was pushed early on by microsoft was better in this regard, but luckily a lot of ideas from ORTC are also making their way in webRTC.
"I feel so strongly that WebRTC/P2P is the future I want to see."
This sounds like it'd bring back the good ol' DDoS days where cable modem users could just knock dial-up users offline with a simple packet flood, or ACK flood, or SYN attack, because the IP address was known. Especially since WebRTC has a nasty habit of leaking IPs when (easily) misconfigured, so much so that VPN services have dedicated WebRTC services to check if you're leaking out your IP due to such easy chance of misconfiguration.
I am fine with NATs! I think is a good compromise between security and connectivity. If two users want to connect they have to explicitly do NAT traversal.
Wasn't mDNS designed for small networks like intranets, not the whole internet?
I'm reading a discussion on Google where in the comments it is mentioned that you can still obtain IP addresses even with mDNS enabled if you're allowing video and audio with a specific flag set (again with the configuration implementation.)
Sean is phenominal, when I was writing a WebRTC service a year ago the Pion implementation looked really really good, so good that I learned Golang just to use Pions!
Appreciate the kind words :) Your bug reports were super helpful. Getting users like you are phenomenal, many people just get frustrated and leave without telling me.
I really enjoyed getting your perspective on pion/webrtc as a new Go developer, stuff like that improves the project a lot!
One thing I often ask about in WebRTC threads is if anyone has come across any native erlang or elixir implementations in the wild of the stack (not just signalling.) I've been hoping for a solution like this to pop up in that ecosystem, but so far I haven't found anything. The latest versions of OTP now support DTLS as part of the main library, so it seems more possible now than ever!
For others who are looking for similar WebRTC implementations, these are also out there:
Membrane [1] is a project that aims to create a pipeline-based media framework, much like GStreamer, but more to the point (smaller scope) and written for the Erlang VM. We could foresee a WebRTC module existing at some point in the future, but right now the project seems to be just in its infancy.
I found it while dreaming about writing an Erlang-based WebRTC media server like Kurento, for which this kind of frameworks are invaluable, and currently GStreamer is the king.
I don't know Erlang, but have been looking for an excuse to learn :) If someone is able to lead/maintain the project I can help answer questions/dev work. I just know how to write idiomatic Erlang.
I bet we could implement ice-lite, SRTP and SDP parsing in a few weeks easily.
Wow, that's a huge effort and codebase! Congratulations for releasing it! I like the consistent coding style, the fact that cmake is used, and that sanitizers are used!
I see the title mentions embedded systems. What systems are target of the SDK besides embedded linux? I looked for some kind of platform abstraction layer which would be useful for that, but couldn't find it so far. But actually not sure whether it would be worthwhile at all to run something like WebRTC on a MCU with a RTOS. Protocols that complex often require a ton of dynamic memory, which is not that compatible with the common no-allocation policies on embedded platforms.
I actually don't know the embedded world at all. My background is WebRTC/userland programming. My answers aren't going to be great, but this is where my learning is at today :)
> What systems are target of the SDK besides embedded linux
I had a few conversations with embedded devs and they said if I wrote this in Pure C (and avoided GCC-isms) it would work well for. Most of the people I have worked with so far are doing embedded Linux, a few are doing non-Linux but I just get bug reports from them. They tell me their C compiler chokes on this etc...
> I looked for some kind of platform abstraction layer
We have a library called PiC[0] that does some of that. I don't think it is as low level as you are thinking though! It has portable atomics, and allows people to provide their own allocators stuff like that.
> not sure whether it would be worthwhile at all to run something like WebRTC on a MCU with a RTOS
Yea I think it would be pretty hard. Getting usrsctp and OpenSSL on those devices seem pretty insane. I have looked at OpenSSL alternatives, but for usrsctp I think we will need to solve it ourselves. I want to explore bringing down the footprint more, but the project is really driven by customer demands and no one has explicitly asked yet.
There are a few widely-used embedded SSL stacks, of which mbed TLS is probably one worth targeting or adding support for (wolfSSL is another.) So long as this requires OpenSSL it's gonna basically be stuck in Linux.
I am very curious to understand better how this stacks up against the usual C++ WebRTC implementation when code size isn't an issue, particularly with respect to flexibility. (I work on a project that does VPN services over WebRTC, am doing stuff like WebRTC tunneled over WebRTC, have WebRTC routing over LwIP, and we are working on improving the congestion control algorithms.) My stripped down compile of that library--which admittedly is dropping most of the audio/video functionality as I only care about DataChannels, so maybe if I were doing video I'd be freaking out at the size of the stack (though I would imagine the video codecs would dominate the size, no?)--is clocking in at about 7MB, which is much larger than 200kB but nowhere near the point where I start to care: I mean, the default buffer size per channel in the library is a crazy large 16MB buffer, so I find myself not caring much about the size of my text segment ;P. (Are embedded systems really that constrained these days?)
One thing I have enjoyed about the C++ build is that it has tons and tons of knobs for me to mess with in the binding API... I have to reimplement the network detection and am using two separate socket implementations (which is all pretty trivial as just about everything can be subclassed and replaced), I mess with the offers and answers before handing them back/forth through the stack (which is easy as there is an object-oriented model for all of the data structures; I mean, it isn't a great model, but it is at least a model), I am able to easily assign work to occur on specific threads (and have been able to share a signaling thread while splitting networking threads across backends), and I at times reach into the heart of the system (without any code modifications, though with a really hilarious template trick to help me access private C++ class members) to pull up the SCTP stack internals. (Note that with only one arguable-exception we do everything we do currently without any modifications to WebRTC's code.)
Is this implementation going to be better for me, in maybe having fewer layers of abstraction between me and the underlying state machines, or worse, due to trying to target "embedded systems" in a programming language that tends to be less flexible (and so between those two considerations maybe leading me to have less ability to customize it without having to rewrite or hack on it directly)? I ask because I know a lot of other people see C and think "ah, finally this is simple", but I have not had any issues with the C++ codebase (the secret is to not even try to use Google's build system: just take all the .cc files you need--and only the ones you need--and link them together). (Oh, maybe even really simple: I bet this isn't going to support Windows?)
I know Orchid[0]! Me and another Pion developer were just talking about, we both are really impressed. WebRTC is such an amazing technology that it can enable stuff like this (and it was never planned). We were trying to scheme a way to convince you to use Pion :p
> I am very curious to understand better how this stacks up against the usual C++ WebRTC implementation when code size isn't an issue,
I am biased, but I think if you only care about DataChannels they should be pretty equivalent! Both libraries use libusrsctp/OpenSSL for all the heavy lifting, so that should be the same. They do have different ICE implementations though, so that is probably going to be the first place you will hit issues. I encourage you to try it out though. If you hit any bugs/have any questions I would love to help!
> as just about everything can be subclassed and replaced
Oh interesting. I think you are right, this library wasn't really designed to be inherited/mutated. We do provide a struct with non-standard settings you can flip, so maybe we can add the things you need in there?
> I bet this isn't going to support Windows?
I plan on supporting it! I don't have access to a Windows machine, but I hope I can just put it in travis and never touch it again :)
If you want to reach out via email/GitHub issues I am happy to discuss things further. That is a pretty question heavy comment, so I think I am missing some stuff on mobile.
I spent about five minutes just now--on my iPhone, to kill time before falling asleep, so I notably didn't even have a keyboard--barely glancing at your code on GitHub, and found a buffer overflow :( in your data channel DCEP notification implementation (and then spent like a half hour verifying it on my iPhone and typing this comment on my iPhone, so I am sadly now much more awake than when I started ;P).
You have MAX_DATA_CHANEL_NAME_LEN (which is misspelled; I was very confused when the GitHub search couldn't find it ;P) set to 255 and in RtcDataChannel you allocate name with that size (already suspicious, as in most other places you + 1 for a NUL byte; this is what caused me to search for usages: in samples/Common.c you incorrectly assume it is terminated, btw); but, in handleDcepPacket you get a 16-bit value for the length which you pass through dataChannelOpenFunc to onSctpSessionDataChannelOpen... and then it is that length (as pNameLen) which gets passed to STRNCPY (which I verified is a #define for strncpy) instead of the size of the buffer.
I had momentarily considered "maybe this field fundamentally can't be larger than 255 for some other reason I don't see here" (as I honestly couldn't remember what the length limitations for these things are), but the standard even makes clear that these fields can go up to the full 16-bit length:
> The DATA_CHANNEL_OPEN messages contains two variable length fields: the protocol and the label. A receiver must be prepared to receive DATA_CHANNEL_OPEN messages where these field have the maximum length of 65535 bytes.
(This then made me do a quick search for STRNCPY in your code, and I am noticing you are using the wrong length constants for most of the calls in the SDP parser; this happens to work, because they are all defined to be the same number--255--but is still highly suspicious and would make me want to do a much more thorough audit. Really, my recommendation would be to never call STRNCPY directly: abstract it behind a macro that enforces you pass a direct array and an offset so it can do sizeof() to get the right size at compile time.)
> We were trying to scheme a way to convince you to use Pion :p
Honestly, I guess I am just of the firm belief that this is 2020, and no one should be writing code in "Pure C" under any circumstance :/. You might totally find a place where I made a mistake like this in my code (as I am, after all, still using C++, and I have very few fundamental guarantees), but when I fix it it will fix that entire class of bug for all of my code everywhere, as I have templated abstractions for things like buffer management (where I have even been trying to slip in some pseudo-dependent types to elide runtime checks) and am able to use deconstructors to manage my resources; my biggest issue is dealing with asynchronous coroutines running code in objects as they get deallocated, but even there I have been able to design a mostly type-safe abstraction (that if nothing else should catch the coding mistake at runtime and safely terminate the process). Can I maybe convince you to rewrite this in C++20 or Rust? :(
> We were trying to scheme a way to convince you to use Pion :p
FWIW, I just determined that "Pion" was actually referring to yet another implementation of WebRTC you have done, this one in Go, and so is actually quite interesting from a security perspective (for the very reasons why I believe this one to be dangerous). I had not seen it before.
My biggest concern is that I only have 15MB of memory available on iOS in the Network Extension (an absolutely brutally small amount of memory), and so I have a hard time using anything written in Go (I had actually tried before, linking parts of Ethereum written in Go, and then having to come up with an alternative plan) but I will add it to my list of projects to evaluate. If it can manage to only use a megabyte or two of RAM as overhead I can seriously consider it (I have infinite text segment, though).
(My current intention is to rewrite a lot of Orchid's code in Rust at some point; their async/await support wasn't quite ready when I started this, and I had a deadline, but it has since been released. I actually feel bad that I wrote as much new security code in C++, but I have gone to great lengths to make what I am doing as safe as possible and I allocate a good amount of my time to safety engineering... and even then I will note that coroutine issue would likely be obvious how to make impossible to code in Rust.)
(Also: a friend of mine has noted that I come off as arrogant in my previous comment. I don't agree with a lot of his advice for it, but I do agree that I come off as arrogant. Part of it is probably that I am a bit arrogant, which I acknowledge and will admit sucks. I made the comment about the misspelled identifier as I thought it would be a light and humorous moment in an otherwise dark comment, not because I wanted to rub anything in; and I accept that "this didn't take me long to find" makes it seem like I am showing off, but I really really really wanted to make it clear that I didn't spend half the day analyzing the code to find one bug. I dislike Rust and I dislike the Rust brigade, but they are actually "correct", and to have an SDK for an AWS product that I actually have been wanting to use on another of my projects come out marketing how it is written in "Pure C"--along with all the bugs that one would expect from a project written in "Pure C"--is extremely disappointing and should be held up as an example of why I likely shouldn't even be tolerating C++ and maybe should code everything in Rust no matter the tradeoffs.)
> If it can manage to only use a megabyte or two of RAM as overhead I can seriously consider it
Lets see if we can make this happen! If you are interested I would love to help. I have been looking at making Pion WebRTC work with TinyGo, then it would work really well!
> I come off as arrogant in my previous comment
You are fine! Everyone has different communication styles. You bring up valid points, and I really appreciate your enthusiasm. Earlier in my career it would have hurt though, but I have been through much worse. I will work on adding a spell checker to travis, not sure how else to stop something like that from happening again.
> "Pure C"--is extremely disappointing
I walk two very distinct paths in life. At one time I work on Pion and have very idealistic goals. I am proud that it builds fast, it is accessible, community owned and safe code. The unfortunate reality is that many people are never going to use it and those that use it usually aren't in the position to pay. The majority of the working hours in my last two years have been on it.
There is a demand for the C SDK. People have things they want to build, and I want to empower them. I agree that C has issues, but all I can do is try to fight that (sanitizers, fuzzing, code coverage). I am excited about what people are going to build with it. I try to be pragmatic, and my paying work lets me go and accomplish things like Pion.
What are your thoughts on Janus webrtc signaling server and do you consider targeting it?
I am building a drone fleet manager with live video and integration with other video sources and have been working with Janus because it allows me to be cloud/premises independent. What are your thoughts on cloud independence for webrtc?
Intel has just released Open WebRTC Toolkit (OWT) [1]. It is an end to end audio/video communication development toolkit based on Webrtc, optimized for Intel Architecture.
At least Amazon seems to care about supporting sanitizers on the build, so I wonder how much it has been validated through fuzzing and similar security validation processes.
Which is kind of relevant if such implementation is to be burned into a piece of hardware, that might never see any kind of updates.
Nice. I have been on and off woth gstreamer, and combining all of amazon stack for video processing, it makes a lot of sense for surveillance applications!
1.) Yes it does! It has OnDataChannel, you can set it on the PeerConnection. createDataChannel is a WIP, hopefully this weekend.
2.) Yes. The WebRTC peers can do Data only if they want. We provide STUN/TURN servers, and works with any compliant WebRTC implementation. This repo exists just to serve embedded devices (which was harder before)
3.) Yes it is fully P2P. Even when going via TURN it is E2E encrypted, Amazon relays the traffic but it can’t decrypt it only the peers can.
Having implemented the entire WebRTC spec a couple times now, what wisdoms have you learned that can be shared with us mortals?
If you could write a protocol for the same use-case right now that instantly became as popular and standard as WebRTC, what would be different about it?