Oh this all brings back memories, of Sococo in the 2000's. We faced all these problems and had similar solutions to them all.
We even had a rapidly adapting network make-and-break recovery layer. You unplug your laptop from a wired connection, switch to wireless - we recovered in milliseconds. You heard barely a click.
The encryption issue is fun - we had a rotate-key message in-band. The receiver loaded new keys and tried them in sequence to ease the turnover time - out-of-order packets etc could make it ambiguous for a short while which key to use. A cache and aging keys out made it work pretty well.
Remixing on user stations proved to be problematic (mentioned elsewhere on this thread). You'd think if 6 people at one site were conferencing with a dozen elsewhere, you could elect one at each site to mix-and-forward. But corporate networks made it hard to determine who was 'adjacent' - they were often layered and without uPNP (is that what the router protocol is called?) you couldn't tell if somebody at the next desk was even in your company.
We had up to 100 people in a conference, and our enter-the-conference time was on the order of 100ms. Click into an all-hands, and be able to hear everybody before you finger left the mouse button. It was wonderful.
Sococo today is a sad shadow of that. They went open-source and lost all our IP instantly. Just another WebRTC client last I knew.
There was little or nothing in WebRTC to match what we'd spend 5 years creating. So they were back to 1-5 people in a conference, with 1-3 second connect times, and no resilience to network changes.
The excuse they gave was "We can't rely on 6 people in Iowa for our core IP". So they switched to some open source mix node that was the pet project of 2 guys in Italy. Two academics, who gave it hardly any attention. And it had zero IP; just a collection of APIs stitched together to give you the impression of having a mix node.
We said all that at the time. But such was the power of the magic words "Open Source" that it all bounced off their mental shields.
Maybe I'm kinda leaning beyond the practical/relevance limits of "old code still interesting", but could you open-source the implementation you came up with given the passage of time? Lots of people are stuck on low-bandwidth links so a codebase optimized for slower connections would absolutely fly (and consistently so, for everyone, under lots of conditions), and everyone always wants to use less bandwidth anyway.
From an implementational perspective it's also always good to have explicitly bespoke designs out there to contrast against the bog-standardness of WebRTC with its standard "can't be helped" set of limitations and flaws.
I also find it very fascinating to hear that OSS was the cause of (headdesk-inducing) myopia and blindsiding. My (naive, distant, apparently out of date) impression was that open-source was incorrectly perceived as the inferior option in the stereotypical case. I guess the entropy pool really can go in all the directions...
> Lots of people are stuck on low-bandwidth links so a codebase optimized for slower connections would absolutely fly (and consistently so, for everyone, under lots of conditions), and everyone always wants to use less bandwidth anyway.
You would think so but why have solutions like Mumble[0], which allows for extremely low-latency and high-quality voice calls and has existed at least since the mid 2000s, not become more popular during the pandemic? Or why didn't people at Zoom and MS Teams at least learn from Mumble?
> You would think so but why have solutions like Mumble[0], which allows for extremely low-latency and high-quality voice calls and has existed at least since the mid 2000s, not become more popular during the pandemic?
Because there is no official Mumble server.
People know how to download an application, click "install", and register an account. But ask them to open a port on their router's firewall/NAT, or set up DNS, and you instantly lose 99.9% of your user base.
It could have been different, but lay people never had the chance to install their own server. They couldn't do it with Dial Up, they didn't have the upload bandwidth with ADSL, they didn't have fixed IP addresses, there's the NAT hurdle, outgoing SMTP is blocked everywhere… that ship has sailed. Even I host my websites on a remote virtual machine I rent.
Mumble has a bunch of issues, the main one being their confusing UI, especially on mobile clients. I'm regularly in mumble conferences and for example accidentally switching the room instead of pushing the push-to-talk button happens quite often.
There's also a bunch of more technical problems. For example as mumble uses a udp protocol it handles dropped frames badly, like on a spotty wireless connection. Results in missing audio. Not just that sound doesn't arrive (in both directions) but it also doesn't tell you something's wrong.
Mumble has problems with changing audio setups which require a restart of the client.
Also no video, no simply calling people - you need to go to a server and they need to go there, too. Basically stopped innovating some time ago and everone else moved further.
>For example as mumble uses a udp protocol it handles dropped frames badly, like on a spotty wireless connection.
WebRTC also uses UDP—as well as virtually every other real-time conferencing platform since the internet existed. TCP is too constraining to use for voice because every single packet retransmission only increases delay further. Dropping packets when they don’t arrive on time is necessary in order to minimize delay, which is one of the principal goals of Mumble.
The real solution to dropped packets is not TCP—it’s a quality jitter buffer, and if you don’t like mumble’s performance in that respect then you need to look at the JB. A good JB will buffer and reorder packets within some statistical measure of network jitter, but the behavior is very explicitly not to retransmit.
Google cheated with their implementation of WebRTC by purchasing Global IP Solutions, which gave them the most advanced jitter buffer in the world at the time: NetEQ.
Long enough mumble sessions will desync and the developers don't want to do anything about it. Mobile clients all suck. Mumble is great, but it has flaws too :)
When arrogant MBAs make technology decisions they don't understand and aren't concerned with the details because "their way is best".
Sorry to burst their bubble, but most of the tech world operates on a "6 people in Iowa" for niche technologies.
It seems often in history that better technologies are lost as the wheel is reinvented for no apparent reason, either due to NIH syndrome, political/business concerns, or out of sheer ignorance. Hubris to the overconfident and a loss for humanity are often the results.
Some of the Engineers involved may have some of the components. Tom I know has the desktop client. Chris may have a mix node machine in a closet somewhere :)
The backend I'm not so sure. Guy who wrote most of that, is a middle manager at Microsoft. My son wrote some, but he didn't keep any of it.
It did take a lot of screen real estate, when open.
But it had its advantages. You could see who was talking with whom - the bobble-heads even blinked. You could visually show when you were busy (close office door). A company meeting took seconds for everyone to assemble - you'd see folks blinking into the big room.
I had a separate monitor for leaving it up, but I had the space for that. It was always an issue.
>> There is no off the shelf software that would allow us to support calls of that size while ensuring that all communication is end-to-end encrypted, so we built our own open source Signal Calling Service to do the job
But wasn't there Jitsi? [0]
I think its great we have competition among Free Software projects so that both can improve. But sometimes I feel like maybe duplicated efforts create two 5/10 solutions. Instead what we really want is one 8/10 solution, or better.
There is some duplication of effort but sometimes progress happens via rewrites and that might actually be a faster way to an 8/10 system than direct collaboration?
Also I think it’s interesting to see how this builds on Google’s work (the googcc algorithm). Which of course builds on previous open source work. The underlying technical collaboration happens even with quite different organizational goals and different codebases.
All of these things are built around WebRTC to some degree, which was built on previous open source work and standards. We all benefit from many contributions over many years.
Same, all my hobby groups switched from Jitsi to Google Meet due to disconnect issues. And Meet is actually pretty good. Can't compare to Zoom as I don't know anybody using it.
It's the first of the links where they say "When building support for group calls, we evaluated many open source SFUs", so I suppose it's either not one of the two with "adequate congestion control", or is the one that did not reliably scale past 8 participants?
Can you elaborate on the scaling issues you hit (and with which implementations?) I've used Janus + MediaSoup but not Jitsi before for WebRTC audio for web based multiplayer games.
Daily.co has a developer friendly offering that accomplishes this as well. Many offerings available and many reasons to not take on this added complexity.
Signal uses phone numbers as identifiers because it controls spam. There's no reason - at all - they couldn't just throw the gates open and say "we hand out GUIDs per account and you do some auth process to tie them together to a master key" - but that's not why they're not implementing it.
Signal without the phone number requirement becomes THE way to move data from anything to anything - malware would be written which setup a Signal account and used that for command and control. I'd wire all my service messages on servers to go via Signal - it's the perfect mechanism tied to the device I want them too.
The answer is, Signal doesn't have a way to offer "usernames only" which wouldn't almost immediately explode the services usage far, far beyond what they can plausibly sustain in server costs.
EDIT: Signal desperately need to figure out an actual monetization scheme for use-cases which do not need absolute privacy. Billing businesses for verified access to the service would be one way. Ideology and all is great, but if they can't pay the bills none of it is going to matter.
Users aren't personally identifiable by contacts or their address
Communication is authenticated and private
No person or server can access contact lists, message history, or other metadata
Resist censorship and monitoring at the local network level
Resist blacklisting or denial of service against users
Accessible and understandable for non-technical users
Reliability and interactivity comparable with traditional IM services
OK, the accessibility part is not implemented, but that is besides the point. I am not arguing in favor of using Ricochet here, so it does not matter anyways. FWIW, it has been audited.
> The answer is, Signal doesn't have a way to offer "usernames only" which wouldn't almost immediately explode the services usage far, far beyond what they can plausibly sustain in server costs.
Why can’t they let heavy users run their own servers?
> Signal without the phone number requirement becomes THE way to move data from anything to anything - malware would be written which setup a Signal account and used that for command and control.
Element doesn't require a phone number and I have no malware issues... also email doesn't.
I do not even know what the problem is. Being able to create as many accounts as one wants? How is that used for "command and control" though? I am missing something here I think.
I don't see it either.. maybe a Signal employee posted this comment to try to justify phone number collection because they can sell those at a premium (or a secret order from the Gov. forces them to do it).
Also, what I want is accounts protected solely by username and password (actually, just a 50 character password without login would satisfy me)... this is one reason why I am slowly moving away from gmail. 2FA doesn't help me because I use my own device to login and if they get compromised, I am screwed either way.
i call this a bs strawman. this is just nonsense because as you said, element and email are perfect examples of decentralized networks. you also have fediverse that woks on trust between a local network as well as any trusted servers, still that system doesn't just explode.
> Signal uses phone numbers as identifiers because it controls spam.
Signal uses phone number as identifiers because it's the easiest way to find other users. There's no other reason Signal uses phone numbers as a primary ID.
I wonder if I ever needed contact discovery, as in, actual contact discovery, such as: you start typing foo, then you get "foobar, "foobaz", and so forth. It definitely complicates things.
In India at least unused numbers are recycled back to new subscribers after 6 or so months of no activity. Blocking merely on the basis of the phone number would create issues in the future though
Usernames work. You could even use UUIDs these days as QR is an increasingly common way of sharing data. But yeah, usernames would be a great improvement.
Usernames do not just work. The Signal team is not unaware of usernames and Signal is not a weird scheme to get all your phone numbers. The difference between Signal and systems that use usernames (or email addresses) is that Signal deliberately doesn't operate a serverside directory or buddy list service. By contrast, other relatively popular messengers essentially keep a plaintext database of who talks to who on their service.
What phone numbers allow Signal to do is to piggyback off the contact lists people already have on their devices.
I have like 6 phone numbers listed in my phone contact list for my sister alone, because she changes numbers every time she job hops and gets a new work phone, so I really question how well phone numbers work either. Most of those numbers are no longer hers, but I've lost track of which ones.
I feel like there's a bubble of people for whom phone numbers are perhaps a useful, durable identity token, but I really think it's very much a bubble. Most people's phone numbers change fairly frequently.
> Most people's phone numbers change fairly frequently.
Citation?
I think it’s just as easy to claim (and I certainly would bet that) most people’s phone numbers do not change frequently - I’d wager on something like 70% remain the same for 5+ years.
My phone numer has literally never changed (going 20+ years now). Perhaps it's different in the US somehow (e.g people use their work number for private purposes too), but in my experience private numbers tend to remain the same.
Not only that, but how can I use Signal on desktop without it? Can I do that? Plus every single citizen here who has a phone number has it tied to their identity, even if they buy the SIM card at the gas station, for example. They have to verify and confirm their identity once a year. Its purpose is to keep track of who uses what number.
You can use signal on a desktop but you still need to tie your account to a phone number. It has a mechanism to share your history and all that to a desktop app, with key certification managed (afaik) from your phone app (and it requires periodic re-pairing of it if you're not using it regularly). It's not really a great experience imo.
You can here too, I just know a lot of people who don't. And people who use their work phone (even if I think that's a bad idea) for personal use don't always have the option of keeping the number.
I operate an E2EE messaging service that uses email instead of phone numbers as user IDs
My servers also don't keep any record of conversations or "buddy lists". Yes, contact lists get wiped when changing device and email addresses aren't as easily stored in phone books (I don't even access a user's phone book for that matter). Just saying that it's possible to not use phone numbers. Granted not as convenient once you've built your entire user base off of their phone numbers
So the Signal server has no information about phone number X talking to or being in contact with phone number Y at all?
> By contrast, other relatively popular messengers essentially keep a plaintext database of who talks to who on their service.
I know, there are plenty of not-so-privacy-preserving messengers out there. The way Ricochet[1] and Briar[2] does it is probably the most privacy-preserving one, and it can be made extremely convenient.
It's rather funny to me that I still get notifications that someone I knew ages ago (but are still in my contact list) are now on Signal.
I find it funny rather than scary because I knew the tradeoffs Signal decided on even before I started using it. The fact that you use Signal itself is not a secret. If you need that to be secret, use a burner phone or something else. But I'm pretty sure not everyone else knew that when they joined.
In one case, I got notification that a guy I knew in high school had joined Signal, he was pretty far left then, and googling him I found out he was extremely far left now (splinter of a splinter of an anti-electoral Maoist group). I sent him a friendly note welcoming him, and explaining the basic thing I've explained here, that everyone can see you joining signal.
One way would be for Signal to create a hash of all your contacts phone numbers and look up if those hashes exists on the server. No contact details needed on the server, just the hash of the phone number connected to Signals user ID.
> The first instinct is often just to hash the contact information before sending it to the server. If the server has the SHA256 hash of every registered user, it can just check to see if those match any of the SHA256 hashes of contacts transmitted by a client.
> Unfortunately, this doesn’t work because the “preimage space” (the set of all possible hash inputs) is small enough to easily calculate a map of all possible hash inputs to hash outputs. There are only roughly 10^10 phone numbers, and while the set of all possible email addresses is less finite, it’s still not terribly great. Inverting these hashes is basically a straightforward dictionary attack. It’s not possible to “salt” the hashes, either (they always have to match), which makes building rainbow tables possible.
Regardless of this, Ricochet and Briar both use Tor hidden services and they are metadata-free. You do not send any metadata to any servers. I have a link to the design of Ricochet that is easily digestible in some of my other comments.
Usernames isn't the solution to anonymity. Your username here isn't anonymous. And how do you share a username without connecting another identity? It's the same problem as using a phone number (supposing you're American and not too different if your ID is connected to your phone number).
I would rather prefer doing it the Ricochet way, yes:
> The recipient can calculate the sender's contact ID based on the public key, and authenticate it by verifying the signature on the request. This proves that the sender can publish the hidden service represented by their contact ID.
This is using IDs. For convenience of sharing and receiving: could be copy pasted (share button works on smartphones, too), and QR codes could be used.
Ricochet is metadata-free, and it can (or is, actually) be resistant to traffic analysis, too. No one knows who you are, and no one knows who you talk to.
Can you explain this a little more? Because the problem I see is that with any unique identifier it is hard to share in a hostile environment. Let's say I want to share my contact information to someone here but also share my contact information with someone on Reddit and not reveal that I am godelski on HN. Because if I use the identified "1234" here and "1234" on Reddit then I've connected those two accounts.
The only way I see it working out is with temporary identifiers or expiring links like FF send used to have. I saw some users talking about this in the community signal community form[0]. I agree with the users that are trying to say they want to use usernames to be anonymous, not as a way to hide a phone number.
I do not think that OP was referring to implementing it the same, or even in a similar way, but to use a username/password pair. OP is free to correct me if I am wrong though.
In any case, elimination of metadata done right is the way Ricochet[1] does it. The recipient can calculate the sender's contact ID based on the public key, and authenticate it by verifying the signature on the request. This proves that the sender can publish the hidden service represented by their contact ID. You can read more about it here: https://github.com/ricochet-im/ricochet/blob/master/doc/desi...!
I don't know what their threat model is but it's interesting that they don't seem too bothered about reducing meta data collection potential on the server. I bet you could put together some pretty interesting graphs of who is talking to who, how much they talk and when.
I'm interested in a explanation for that too. Signal recently introduced the same sender key method for groups [1] and, correct me if I'm wrong, this is first formal mention of the "Selective Forwarding" method we have.
I get that Signal's open-source implementation doesn't store this type of metadata, but is very misleading to see the founder of Signal tweeting that "doesn't have access to the data" [2] while it's not that much better than Whatsapp in the technical side.
The other thing it knows is that you have provided a proof that you are a member of a the group conversation associated with the group call (basically a proof that you are allowed to join the call). And that proof is based on Signal's "zkgroup" system.
That's it. Once the server has the proof, it forwards packets for you. And it doesn't learning about you from then proof (other than what the proof proves: that you can join).
But isn't knowing the IP a decently good identifier? I know signal is trying to be as trustless as possible, but since IPs don't change often can't this be used to deanonymize people relatively easily? I mean supposing the server became hostile/hijacked?
This was already the case before group video calls and, yes, IPs can be used as identifiers. I don't think this can be prevented on a technical level[0] – you will always need to trust the server here. Signal does have a good track record, though: https://signal.org/bigbrother/
[0]: Unless you do p2p routing of video calls – with the known downsides regarding traffic, performance and NAT. But even then you would probably still need a server as rendez-vous point.
DPI of the network traffic at the ISPs upstream of the server side (Whether ordered by US national security letter or other) would collect some very interesting data on ASN/IP traffic origins and time of day usage patterns, however.
Or even something much less than DPI and just high sampling rate netflow.
Their messaging substrate is Signal itself, for whatever that's worth, so at least the signaling component of the system should inherit the guarantees Signal already makes. But it's a good question.
pthatcherg: this is super cool; on the Matrix side we hadn't spotted that you'd written your own SFU. Would you have any objection to us trying to use it as a decentralised Matrix SFU, as per the design in https://github.com/matrix-org/matrix-doc/blob/matthew/group-...? (Obviously we'd do so within the constraints of AGPLv3).
cool! i suppose even with limited metadata with e2ee, you could probably do a quick study to get a sense for how many calls could switch (how many are 2 party, how many aren't behind nats, etc) and maybe even get a sense for how much could be saved in operation costs.
interesting privacy considerations now that i think of it though. p2p would trivially leak the existence of a call between two devices, but i suppose that given the right taps it would be possible to correlate through server streams based on timing data anyhow.
As someone who doesn't develop video codecs, the bandwidth consumed by sending multiple video resolutions seems a poor trade-off for the bandwidth saved by receiving a reduced resolution. (This isn't a criticism, but curiosity - I'm certain the developers thought of it.)
Is it possible (or is it already done) to compress the data by sending the lowest resolution in full, and then for each higher resolution, send only the delta between it and the one immediately beneath it (i.e., resolution layer 5 would contain only the delta between it and resolution layer 4)?
Perhaps because of E2EE the final output would have to be reconstituted on the fly client-side (i.e., because the server lacks visibility into the data), which might be asking too much. Then again, the client is generating all these resolutions.
The tradeoff makes sense mostly for many-party calls - for a 1:1 call sending multiple resolutions is a waste, but for a many participant call the downlink bandwidth usage becomes much larger than the uplink.
You layer idea is called SVC and is mentioned in the article. But VP8 doesn't support it, it is waiting on them to upgrade to VP9 or AV1.
> The tradeoff makes sense mostly for many-party calls - for a 1:1 call sending multiple resolutions is a waste, but for a many participant call the downlink bandwidth usage becomes much larger than the uplink.
Excellent insight, thanks.
> You layer idea is called SVC and is mentioned in the article. But VP8 doesn't support it, it is waiting on them to upgrade to VP9 or AV1.
Thanks. I noticed SVC in the article but somehow overlooked that detail.
I have been quite happy with some of my SFU code on top of the Rust wrapper for Mediasoup[0] which I understand from this post doesn't have congestion control. I would love to give this a try, but I fear including a library w/ AGPL license. The license choice is not worth hashing out here, and I'm sure Signal's goal isn't necessarily developer adoption compared to openness, but it will hurt adoption.
The article specifically mentions that they operate the infrastructure for relaying encrypted video streams for up to 40 participants.
I can also select media quality on iOS. My options are "compressed way too much" and "compressed too much". I assume you have the same options.
I would like to be able to attach images as files and have them come though unmodified. It is a general purpose communications tool, it should not be editorializing over my attachments.
I use Signal to communicate privately with my attorney. Why does anyone think tampering with evidence in transit is okay?
Apple also doesn't support open source in the App Store, so I can't fix the problem myself.
> I would like to be able to attach images as files and have them come though unmodified. It is a general purpose communications tool, it should not be editorializing over my attachments.
It's a usability trade-off, it's better for a lot of people to compress these at least slightly. The main real problem is awareness, people have to know about this so that they can put their files into .zip instead of plain .jpg.
Well, it happens inside a precompiled application which I cannot modify, and the file I selected is not the file that is received, so I am going to include "actions undertaken against my will by the unmodifiable application on my own device after I chose the file to send" as part of the "in transit" phase.
One thing that’s not clear to me (and maybe this has to do with how vp8 works?). In h264/h265 you have I frames (which are standalone) and P frames.
If I recall correctly, P frames require having decoded the immediately preceding frame or video corruption occurs until a new I frame is received.
With SFU selectively choosing which resolution stream to send a frame from next, how does this work? Is the client encoding each frame as an I frame? Is there feedback from the SFU to tell the client which stream its using so that the client knows to emit I frames (and thus generating a few corrupted frames until the stream is resynced)? Does vp8 work differently?
Skip the SFU and use media paging with the most recent 1.0 webrtc api you may not need an SFU, in fact the SFU could be a major budget destroyer when p2p meshes using participant paging is much better.
Looking at the loop, what prevent a slow client to block the iteration? I understand that it's UDP so fire and forget but shoudn't there be a thread pool to handle that in parallel?
To scale to thousands (is this even useful?) of e2e users build a tree of participants who can remix each other's video.
Pick a handy mixing ratio like 4:1 or 9:1 (a square helps, since they compose nicely if downscaled to a grid vs. active talker stays fullscreen) and nodes with the highest bandwidth and lowest latency take M-1 streams and add it to their own to make an M:1 mix which can be forwarded to a node closer to the root which produces another M:1 stream, and the root sends a single mixed stream down the tree until every participant has the mix. Max bandwidth at each node is M down and M up. Minimal spanning tree with max M edges per node recomputed as participants leave and join. Build 3 or 4 distinct trees and leave the connections open for more rapid switching if intermediate nodes stop participating.
How long does that last? Telegram uses a few hundred million dollars each year, although they are significantly larger.
> As Telegram approaches 500 million active users, many of you are asking the question – who is going to pay to support this growth? After all, more users mean more expenses for traffic and servers. A project of our size needs at least a few hundred million dollars per year to keep going.
> needs at least a few hundred million dollars per year to keep going
I'm pretty sure that is not server cost. This is probably the standard approach of companies hiring tons of personal and spending tens of thousands or hundreds of thousands on ads every single day.
Hopefully proper client sync and cross-platform migration are on the list.
Two of the biggest issues for me with Signal have always been the incredibly poor client sync which I've heard is intentionally crippled to the point of being useless because of some hand-wave-y privacy reasons.
Also, the inability to move between Android and iOS. I lost years worth of conversations when I switched from Android. Worse, one day my Signal client on Android started crashing on startup. I emailed support@ with logcat files and never heard a peep. That lack of support for probably one of the most serious use issues one could have left a permanent bad taste in my mouth.
I don't care whatever paranoid argument someone at Signal has for why I can't sync my desktop client to my phone. Stop being paternalistic and let me decide if the tradeoff security-wise is worth it to me, please.
This engineering post is all for nothing given that Signal is still a centralized messaging service and still requires your phone number to sign in/up which means users are still vulnerable to SIM swapping attacks.
Hence this, the service can be 'shutdown' at any time, making Signal even more uninteresting to use. (Probably affected by the Log4J RCE stuff)
In theory the code could be made reusable, but it probably was structured in a very much un-reusable way and will never be available to other projects.
Everything has to be some giant monolithic integration because if you engineered it properly as a reusable library, it wouldn't provide any business benefit to the authors. It's almost a form of closed source by obfuscation through over-integrated design.
"Full mesh: Each call participant sends its media (audio and video) directly to each other call participant. This works for very small calls, but does not scale to many participants. Most people just don't have an Internet connection fast enough to send 40 copies of their video at the same time.
Server mixing: Each call participant sends its media to a server. The server "mixes" the media together and sends it to each participant. This works with many participants, but is not compatible with end-to-end encryption because it requires that the server be able to view and alter the media.
Selective Forwarding: Each participant sends its media to a server. The server "forwards" the media to other participants without viewing or altering it. This works with many participants, and is compatible with end-to-end-encryption."
Imagine an end user who is interested in "very small calls" with friends and family. She is not interested in communicating to an infinitely large audience ("broadcasting"). She never has group calls on Signal with 40 people, even ones with Grandmother and full extended family are well under 40 people. We have to use our imagination because this user does not actually exist.
The imaginary user reads this blog post and she thinks to herself "Full mesh sounds like the best design. There is less/no reliance on a third party, traffic does not need to be sent to an additional third party server." With full mesh, there is no need to mention the caveat "without viewing or altering it", or selectively choosing not to forward it to certain recipients. Full mesh seems to give the user the most control and require the least dependence on third party servers (not necessarily "no dependence", but the least dependance).
Then she reads this line: "Because Signal must have end-to-end encryption and scale to many participants, we use selective forwarding."
Just an FYI full mesh would still require communicating with a third-party server, at the very least for initial networking when joining/leaving a group call.
The whole point of E2E encryption is so that passing data through a third party shouldn't matter in the first place.
And lastly, even when you have just a 1:1 video chat, sending and receiving full resolution/quality multimedia can still be way too much for some peoples' internet connections. UX is extremely important for Signal, as unreliable video chat is a surefire way for those less caring about privacy to hop back over to a privacy-violating alternative.
I feel sorry for those working on bringing security/privacy to everyone, as they have to appease power users and privacy absolutists, along with one's grandmother and the TikTok generation.
I guess if there is enough interest from users, we could look into adding a full mesh mode. It's been a year so far with group calls and no one has asked for it, probably because there isn't really much to be gained. Even 1:1 calls, which are p2p, sometimes need to be relayed through a TURN server.
"Even 1:1 calls, which are p2p, sometimes need to be relayed through a TURN server."
What percentage is "sometimes". For example, maybe it is only where two callers are behind the same NAT or some other situation that could be rare relative to the majority of calls.
In any event, it sounds like TURN is not used unless it is needed. The same idea could apply with server forwarding, where forwarding is only used if it is needed. For a small group call, forwarding might not be necessary and full mesh might suffice.
As one user, I would be interested in a full mesh option alongside a server forwarding option.
There is P2P software for small groups I have always used that has a traffic forwarding (relaying) option after the peers are connected, but the default is direct connection, with no need for a relay. The forwarding option is useful for situations where it is needed, but it is not the default. Personally, I have never had to use it.
We don't have a percentage because for 1:1 calls, we don't even know how many calls there are (it's completely p2p), but I've ofte heard quoted somewhere between 5-10% of calls.
Yes, TURN is only used if needed in those cases, and, yes, you could use a forwarding server instead of a TURN server (although with different tradeoffs).
I'll add the full mesh option to our list of options for future work that we discuss regularly.
Yes, that's called "signaling". In the case of Signal, it's done through Signal messages, which pass through a server. But the server doesn't know they are calling related messages. They could be any message.
We even had a rapidly adapting network make-and-break recovery layer. You unplug your laptop from a wired connection, switch to wireless - we recovered in milliseconds. You heard barely a click.
The encryption issue is fun - we had a rotate-key message in-band. The receiver loaded new keys and tried them in sequence to ease the turnover time - out-of-order packets etc could make it ambiguous for a short while which key to use. A cache and aging keys out made it work pretty well.
Remixing on user stations proved to be problematic (mentioned elsewhere on this thread). You'd think if 6 people at one site were conferencing with a dozen elsewhere, you could elect one at each site to mix-and-forward. But corporate networks made it hard to determine who was 'adjacent' - they were often layered and without uPNP (is that what the router protocol is called?) you couldn't tell if somebody at the next desk was even in your company.
We had up to 100 people in a conference, and our enter-the-conference time was on the order of 100ms. Click into an all-hands, and be able to hear everybody before you finger left the mouse button. It was wonderful.
Sococo today is a sad shadow of that. They went open-source and lost all our IP instantly. Just another WebRTC client last I knew.