Oh this all brings back memories, of Sococo in the 2000's. We faced all these problems and had similar solutions to them all.
We even had a rapidly adapting network make-and-break recovery layer. You unplug your laptop from a wired connection, switch to wireless - we recovered in milliseconds. You heard barely a click.
The encryption issue is fun - we had a rotate-key message in-band. The receiver loaded new keys and tried them in sequence to ease the turnover time - out-of-order packets etc could make it ambiguous for a short while which key to use. A cache and aging keys out made it work pretty well.
Remixing on user stations proved to be problematic (mentioned elsewhere on this thread). You'd think if 6 people at one site were conferencing with a dozen elsewhere, you could elect one at each site to mix-and-forward. But corporate networks made it hard to determine who was 'adjacent' - they were often layered and without uPNP (is that what the router protocol is called?) you couldn't tell if somebody at the next desk was even in your company.
We had up to 100 people in a conference, and our enter-the-conference time was on the order of 100ms. Click into an all-hands, and be able to hear everybody before you finger left the mouse button. It was wonderful.
Sococo today is a sad shadow of that. They went open-source and lost all our IP instantly. Just another WebRTC client last I knew.
There was little or nothing in WebRTC to match what we'd spend 5 years creating. So they were back to 1-5 people in a conference, with 1-3 second connect times, and no resilience to network changes.
The excuse they gave was "We can't rely on 6 people in Iowa for our core IP". So they switched to some open source mix node that was the pet project of 2 guys in Italy. Two academics, who gave it hardly any attention. And it had zero IP; just a collection of APIs stitched together to give you the impression of having a mix node.
We said all that at the time. But such was the power of the magic words "Open Source" that it all bounced off their mental shields.
Maybe I'm kinda leaning beyond the practical/relevance limits of "old code still interesting", but could you open-source the implementation you came up with given the passage of time? Lots of people are stuck on low-bandwidth links so a codebase optimized for slower connections would absolutely fly (and consistently so, for everyone, under lots of conditions), and everyone always wants to use less bandwidth anyway.
From an implementational perspective it's also always good to have explicitly bespoke designs out there to contrast against the bog-standardness of WebRTC with its standard "can't be helped" set of limitations and flaws.
I also find it very fascinating to hear that OSS was the cause of (headdesk-inducing) myopia and blindsiding. My (naive, distant, apparently out of date) impression was that open-source was incorrectly perceived as the inferior option in the stereotypical case. I guess the entropy pool really can go in all the directions...
> Lots of people are stuck on low-bandwidth links so a codebase optimized for slower connections would absolutely fly (and consistently so, for everyone, under lots of conditions), and everyone always wants to use less bandwidth anyway.
You would think so but why have solutions like Mumble[0], which allows for extremely low-latency and high-quality voice calls and has existed at least since the mid 2000s, not become more popular during the pandemic? Or why didn't people at Zoom and MS Teams at least learn from Mumble?
> You would think so but why have solutions like Mumble[0], which allows for extremely low-latency and high-quality voice calls and has existed at least since the mid 2000s, not become more popular during the pandemic?
Because there is no official Mumble server.
People know how to download an application, click "install", and register an account. But ask them to open a port on their router's firewall/NAT, or set up DNS, and you instantly lose 99.9% of your user base.
It could have been different, but lay people never had the chance to install their own server. They couldn't do it with Dial Up, they didn't have the upload bandwidth with ADSL, they didn't have fixed IP addresses, there's the NAT hurdle, outgoing SMTP is blocked everywhere… that ship has sailed. Even I host my websites on a remote virtual machine I rent.
Mumble has a bunch of issues, the main one being their confusing UI, especially on mobile clients. I'm regularly in mumble conferences and for example accidentally switching the room instead of pushing the push-to-talk button happens quite often.
There's also a bunch of more technical problems. For example as mumble uses a udp protocol it handles dropped frames badly, like on a spotty wireless connection. Results in missing audio. Not just that sound doesn't arrive (in both directions) but it also doesn't tell you something's wrong.
Mumble has problems with changing audio setups which require a restart of the client.
Also no video, no simply calling people - you need to go to a server and they need to go there, too. Basically stopped innovating some time ago and everone else moved further.
>For example as mumble uses a udp protocol it handles dropped frames badly, like on a spotty wireless connection.
WebRTC also uses UDP—as well as virtually every other real-time conferencing platform since the internet existed. TCP is too constraining to use for voice because every single packet retransmission only increases delay further. Dropping packets when they don’t arrive on time is necessary in order to minimize delay, which is one of the principal goals of Mumble.
The real solution to dropped packets is not TCP—it’s a quality jitter buffer, and if you don’t like mumble’s performance in that respect then you need to look at the JB. A good JB will buffer and reorder packets within some statistical measure of network jitter, but the behavior is very explicitly not to retransmit.
Google cheated with their implementation of WebRTC by purchasing Global IP Solutions, which gave them the most advanced jitter buffer in the world at the time: NetEQ.
Long enough mumble sessions will desync and the developers don't want to do anything about it. Mobile clients all suck. Mumble is great, but it has flaws too :)
When arrogant MBAs make technology decisions they don't understand and aren't concerned with the details because "their way is best".
Sorry to burst their bubble, but most of the tech world operates on a "6 people in Iowa" for niche technologies.
It seems often in history that better technologies are lost as the wheel is reinvented for no apparent reason, either due to NIH syndrome, political/business concerns, or out of sheer ignorance. Hubris to the overconfident and a loss for humanity are often the results.
Some of the Engineers involved may have some of the components. Tom I know has the desktop client. Chris may have a mix node machine in a closet somewhere :)
The backend I'm not so sure. Guy who wrote most of that, is a middle manager at Microsoft. My son wrote some, but he didn't keep any of it.
It did take a lot of screen real estate, when open.
But it had its advantages. You could see who was talking with whom - the bobble-heads even blinked. You could visually show when you were busy (close office door). A company meeting took seconds for everyone to assemble - you'd see folks blinking into the big room.
I had a separate monitor for leaving it up, but I had the space for that. It was always an issue.
We even had a rapidly adapting network make-and-break recovery layer. You unplug your laptop from a wired connection, switch to wireless - we recovered in milliseconds. You heard barely a click.
The encryption issue is fun - we had a rotate-key message in-band. The receiver loaded new keys and tried them in sequence to ease the turnover time - out-of-order packets etc could make it ambiguous for a short while which key to use. A cache and aging keys out made it work pretty well.
Remixing on user stations proved to be problematic (mentioned elsewhere on this thread). You'd think if 6 people at one site were conferencing with a dozen elsewhere, you could elect one at each site to mix-and-forward. But corporate networks made it hard to determine who was 'adjacent' - they were often layered and without uPNP (is that what the router protocol is called?) you couldn't tell if somebody at the next desk was even in your company.
We had up to 100 people in a conference, and our enter-the-conference time was on the order of 100ms. Click into an all-hands, and be able to hear everybody before you finger left the mouse button. It was wonderful.
Sococo today is a sad shadow of that. They went open-source and lost all our IP instantly. Just another WebRTC client last I knew.