How to build large-scale end-to-end encrypted group video calls

JoeAltmaier · on Dec 15, 2021

Oh this all brings back memories, of Sococo in the 2000's. We faced all these problems and had similar solutions to them all.

We even had a rapidly adapting network make-and-break recovery layer. You unplug your laptop from a wired connection, switch to wireless - we recovered in milliseconds. You heard barely a click.

The encryption issue is fun - we had a rotate-key message in-band. The receiver loaded new keys and tried them in sequence to ease the turnover time - out-of-order packets etc could make it ambiguous for a short while which key to use. A cache and aging keys out made it work pretty well.

Remixing on user stations proved to be problematic (mentioned elsewhere on this thread). You'd think if 6 people at one site were conferencing with a dozen elsewhere, you could elect one at each site to mix-and-forward. But corporate networks made it hard to determine who was 'adjacent' - they were often layered and without uPNP (is that what the router protocol is called?) you couldn't tell if somebody at the next desk was even in your company.

We had up to 100 people in a conference, and our enter-the-conference time was on the order of 100ms. Click into an all-hands, and be able to hear everybody before you finger left the mouse button. It was wonderful.

Sococo today is a sad shadow of that. They went open-source and lost all our IP instantly. Just another WebRTC client last I knew.

narush · on Dec 15, 2021

> They went open-source and lost all our IP instantly.

Can you explain what this means? Like - other people copied your work?

Genuinely wondering, OSS noob here...

JoeAltmaier · on Dec 15, 2021

There was little or nothing in WebRTC to match what we'd spend 5 years creating. So they were back to 1-5 people in a conference, with 1-3 second connect times, and no resilience to network changes.

The excuse they gave was "We can't rely on 6 people in Iowa for our core IP". So they switched to some open source mix node that was the pet project of 2 guys in Italy. Two academics, who gave it hardly any attention. And it had zero IP; just a collection of APIs stitched together to give you the impression of having a mix node.

We said all that at the time. But such was the power of the magic words "Open Source" that it all bounced off their mental shields.

exikyut · on Dec 16, 2021

Maybe I'm kinda leaning beyond the practical/relevance limits of "old code still interesting", but could you open-source the implementation you came up with given the passage of time? Lots of people are stuck on low-bandwidth links so a codebase optimized for slower connections would absolutely fly (and consistently so, for everyone, under lots of conditions), and everyone always wants to use less bandwidth anyway.

From an implementational perspective it's also always good to have explicitly bespoke designs out there to contrast against the bog-standardness of WebRTC with its standard "can't be helped" set of limitations and flaws.

I also find it very fascinating to hear that OSS was the cause of (headdesk-inducing) myopia and blindsiding. My (naive, distant, apparently out of date) impression was that open-source was incorrectly perceived as the inferior option in the stereotypical case. I guess the entropy pool really can go in all the directions...

codethief · on Dec 16, 2021

> Lots of people are stuck on low-bandwidth links so a codebase optimized for slower connections would absolutely fly (and consistently so, for everyone, under lots of conditions), and everyone always wants to use less bandwidth anyway.

You would think so but why have solutions like Mumble[0], which allows for extremely low-latency and high-quality voice calls and has existed at least since the mid 2000s, not become more popular during the pandemic? Or why didn't people at Zoom and MS Teams at least learn from Mumble?

[0]: https://www.mumble.info/

loup-vaillant · on Dec 16, 2021

> You would think so but why have solutions like Mumble[0], which allows for extremely low-latency and high-quality voice calls and has existed at least since the mid 2000s, not become more popular during the pandemic?

Because there is no official Mumble server.

People know how to download an application, click "install", and register an account. But ask them to open a port on their router's firewall/NAT, or set up DNS, and you instantly lose 99.9% of your user base.

It could have been different, but lay people never had the chance to install their own server. They couldn't do it with Dial Up, they didn't have the upload bandwidth with ADSL, they didn't have fixed IP addresses, there's the NAT hurdle, outgoing SMTP is blocked everywhere… that ship has sailed. Even I host my websites on a remote virtual machine I rent.

Grollicus · on Dec 16, 2021

Mumble has a bunch of issues, the main one being their confusing UI, especially on mobile clients. I'm regularly in mumble conferences and for example accidentally switching the room instead of pushing the push-to-talk button happens quite often.

There's also a bunch of more technical problems. For example as mumble uses a udp protocol it handles dropped frames badly, like on a spotty wireless connection. Results in missing audio. Not just that sound doesn't arrive (in both directions) but it also doesn't tell you something's wrong.

Mumble has problems with changing audio setups which require a restart of the client.

Also no video, no simply calling people - you need to go to a server and they need to go there, too. Basically stopped innovating some time ago and everone else moved further.

mrtesthah · on Dec 17, 2021

>For example as mumble uses a udp protocol it handles dropped frames badly, like on a spotty wireless connection.

WebRTC also uses UDP—as well as virtually every other real-time conferencing platform since the internet existed. TCP is too constraining to use for voice because every single packet retransmission only increases delay further. Dropping packets when they don’t arrive on time is necessary in order to minimize delay, which is one of the principal goals of Mumble.

The real solution to dropped packets is not TCP—it’s a quality jitter buffer, and if you don’t like mumble’s performance in that respect then you need to look at the JB. A good JB will buffer and reorder packets within some statistical measure of network jitter, but the behavior is very explicitly not to retransmit.

https://www.cs.columbia.edu/~hgs/rtp/faq.html#reliable

Google cheated with their implementation of WebRTC by purchasing Global IP Solutions, which gave them the most advanced jitter buffer in the world at the time: NetEQ.

http://www.gipscorp.alcatrazconsulting.com/files/english/dat...

https://chromium.googlesource.com/external/webrtc/+/master/m...

eptcyka · on Dec 16, 2021

Long enough mumble sessions will desync and the developers don't want to do anything about it. Mobile clients all suck. Mumble is great, but it has flaws too :)

codethief · on Dec 17, 2021

How long are we talking? I remember being in Mumble sessions for hours quite often back in ~2007, 2008 and I never noticed anything like that.

eptcyka · on Dec 19, 2021

Usually happens when there's bad connectivity.

errcorrectcode · on Dec 16, 2021

When arrogant MBAs make technology decisions they don't understand and aren't concerned with the details because "their way is best".

Sorry to burst their bubble, but most of the tech world operates on a "6 people in Iowa" for niche technologies.

It seems often in history that better technologies are lost as the wheel is reinvented for no apparent reason, either due to NIH syndrome, political/business concerns, or out of sheer ignorance. Hubris to the overconfident and a loss for humanity are often the results.

nextaccountic · on Dec 16, 2021

> The excuse they gave was "We can't rely on 6 people in Iowa for our core IP"

But, didn't all this code belong to the company? They could have open sourced it themselves, and it would become an open source solution...

What happened to this code??

JoeAltmaier · on Dec 16, 2021

Some of the Engineers involved may have some of the components. Tom I know has the desktop client. Chris may have a mix node machine in a closet somewhere :) The backend I'm not so sure. Guy who wrote most of that, is a middle manager at Microsoft. My son wrote some, but he didn't keep any of it.

jandrese · on Dec 16, 2021

Sounds like they just wanted to stop paying for developers.

blitzar · on Dec 15, 2021

> Sococo today is a sad shadow of that. They went open-source and lost all our IP instantly. Just another WebRTC client last I knew.

I did not know the name, given the explosion of this exact technology the last 2 years that is very sad indeed.

bradknowles · on Dec 18, 2021

I remember Sococo. As an employee of a company that used it, I hated the damn thing.

This virtual anime simulated office is just a really bad metaphor. Skeuomorphism taken to the worst extreme.

Slack channels make much more sense.

JoeAltmaier · on Dec 18, 2021

It did take a lot of screen real estate, when open.

But it had its advantages. You could see who was talking with whom - the bobble-heads even blinked. You could visually show when you were busy (close office door). A company meeting took seconds for everyone to assemble - you'd see folks blinking into the big room.

I had a separate monitor for leaving it up, but I had the space for that. It was always an issue.

Nextgrid · on Dec 16, 2021

You mention that they went open-source. Do you mean the old code was open-sourced?

snissn · on Dec 16, 2021

it's not clear but my reading is that they switched from their proprietary stack, set that code aside and switched to a webrtc based tech

JoeAltmaier · on Dec 16, 2021

They switched to WebRTC and some 'open' media node (which was stagnant and feeble)

Naac · on Dec 15, 2021

>> There is no off the shelf software that would allow us to support calls of that size while ensuring that all communication is end-to-end encrypted, so we built our own open source Signal Calling Service to do the job

But wasn't there Jitsi? [0]

I think its great we have competition among Free Software projects so that both can improve. But sometimes I feel like maybe duplicated efforts create two 5/10 solutions. Instead what we really want is one 8/10 solution, or better.

[0] https://meet.jit.si/

skybrian · on Dec 15, 2021

There is some duplication of effort but sometimes progress happens via rewrites and that might actually be a faster way to an 8/10 system than direct collaboration?

Also I think it’s interesting to see how this builds on Google’s work (the googcc algorithm). Which of course builds on previous open source work. The underlying technical collaboration happens even with quite different organizational goals and different codebases.

pthatcherg · on Dec 15, 2021

Author here.

All of these things are built around WebRTC to some degree, which was built on previous open source work and standards. We all benefit from many contributions over many years.

jcelerier · on Dec 15, 2021

As much as I like Jitsi conceptually, it has consistently performed much more poorly than Zoom starting from 5/6 ppl

distances · on Dec 16, 2021

Same, all my hobby groups switched from Jitsi to Google Meet due to disconnect issues. And Meet is actually pretty good. Can't compare to Zoom as I don't know anybody using it.

Vinnl · on Dec 15, 2021

It's the first of the links where they say "When building support for group calls, we evaluated many open source SFUs", so I suppose it's either not one of the two with "adequate congestion control", or is the one that did not reliably scale past 8 participants?

pthatcherg · on Dec 15, 2021

Author here.

Jitsi and MediaSoup both seemed to have good congestion control at the time.

And, yes, as mentioned, the primary reason for writing a new one was performance/scalability.

gfodor · on Dec 16, 2021

Can you elaborate on the scaling issues you hit (and with which implementations?) I've used Janus + MediaSoup but not Jitsi before for WebRTC audio for web based multiplayer games.

pthatcherg · on Dec 16, 2021

CPU usage was so very high for calls over 5-6 participants. I'm not sure what more there is too it.

gfodor · on Dec 16, 2021

Thanks, in Jitsi or Mediasoup or both?

landstrom · on Dec 15, 2021

Daily.co has a developer friendly offering that accomplishes this as well. Many offerings available and many reasons to not take on this added complexity.

johnisgood · on Dec 15, 2021

There is also https://jami.net/. I have no clue how group video calls are implemented though. It seems like it is not an easy thing to do.

https://wire.com/en/ seems to support it, too, although not exactly "large-scale". Audio calls allow for up to 100 participants, for one.

estaseuropano · on Dec 15, 2021

While I love jitsi, i don't think it is E2E?

pthatcherg · on Dec 15, 2021

Author here.

Jitsi is compatible with e2ee. For group calls, almost all of the work for e2ee is in the clients. And Jitsi can work with such clients.

jkepler · on Dec 15, 2021

I think Jitsi group calls can be end to end encrypted, provided all participants use Chromium 83, per https://jitsi.org/security/.

Naac · on Dec 15, 2021

AFAIK this was a work in progress[0]. I am not sure what the status of this is now.

[0] https://jitsi.org/blog/e2ee/

dest · on Dec 15, 2021

AFAIK it's E2E for 1:1 video chats, but not when more are there.

bilal4hmed · on Dec 15, 2021

Jitsi does support e2ee for groups as well https://jitsi.org/e2ee-in-jitsi/

johnisgood · on Dec 15, 2021

Great, now they should just stop using telephone numbers as identifiers.

XorNot · on Dec 16, 2021

Signal uses phone numbers as identifiers because it controls spam. There's no reason - at all - they couldn't just throw the gates open and say "we hand out GUIDs per account and you do some auth process to tie them together to a master key" - but that's not why they're not implementing it.

Signal without the phone number requirement becomes THE way to move data from anything to anything - malware would be written which setup a Signal account and used that for command and control. I'd wire all my service messages on servers to go via Signal - it's the perfect mechanism tied to the device I want them too.

The answer is, Signal doesn't have a way to offer "usernames only" which wouldn't almost immediately explode the services usage far, far beyond what they can plausibly sustain in server costs.

EDIT: Signal desperately need to figure out an actual monetization scheme for use-cases which do not need absolute privacy. Billing businesses for verified access to the service would be one way. Ideology and all is great, but if they can't pay the bills none of it is going to matter.

johnisgood · on Dec 16, 2021

Check out https://github.com/ricochet-im/ricochet/blob/master/doc/prot.... It is metadata-free. It does not require a centralized server. It uses Tor.

Goals of the project (solved):

    Users aren't personally identifiable by contacts or their address
    Communication is authenticated and private
    No person or server can access contact lists, message history, or other metadata
    Resist censorship and monitoring at the local network level
    Resist blacklisting or denial of service against users
    Accessible and understandable for non-technical users
    Reliability and interactivity comparable with traditional IM services

OK, the accessibility part is not implemented, but that is besides the point. I am not arguing in favor of using Ricochet here, so it does not matter anyways. FWIW, it has been audited.

boogies · on Dec 16, 2021

> The answer is, Signal doesn't have a way to offer "usernames only" which wouldn't almost immediately explode the services usage far, far beyond what they can plausibly sustain in server costs.

Why can’t they let heavy users run their own servers?

authed · on Dec 16, 2021

> Signal without the phone number requirement becomes THE way to move data from anything to anything - malware would be written which setup a Signal account and used that for command and control.

Element doesn't require a phone number and I have no malware issues... also email doesn't.

johnisgood · on Dec 16, 2021

I do not even know what the problem is. Being able to create as many accounts as one wants? How is that used for "command and control" though? I am missing something here I think.

authed · on Dec 16, 2021

I don't see it either.. maybe a Signal employee posted this comment to try to justify phone number collection because they can sell those at a premium (or a secret order from the Gov. forces them to do it).

Also, what I want is accounts protected solely by username and password (actually, just a 50 character password without login would satisfy me)... this is one reason why I am slowly moving away from gmail. 2FA doesn't help me because I use my own device to login and if they get compromised, I am screwed either way.

2Gkashmiri · on Dec 16, 2021

i call this a bs strawman. this is just nonsense because as you said, element and email are perfect examples of decentralized networks. you also have fediverse that woks on trust between a local network as well as any trusted servers, still that system doesn't just explode.

topdancing · on Dec 16, 2021

> Signal uses phone numbers as identifiers because it controls spam.

Signal uses phone number as identifiers because it's the easiest way to find other users. There's no other reason Signal uses phone numbers as a primary ID.

johnisgood · on Dec 16, 2021

I wonder if I ever needed contact discovery, as in, actual contact discovery, such as: you start typing foo, then you get "foobar, "foobaz", and so forth. It definitely complicates things.

askvictor · on Dec 16, 2021

This. Your phone number is your social graph.

DarthNebo · on Dec 16, 2021

In India at least unused numbers are recycled back to new subscribers after 6 or so months of no activity. Blocking merely on the basis of the phone number would create issues in the future though

maxwell · on Dec 15, 2021

What do you suggest?

zamadatix · on Dec 15, 2021

Signal has had standard usernames on the roadmap for years.

johnisgood · on Dec 15, 2021

Usernames work. You could even use UUIDs these days as QR is an increasingly common way of sharing data. But yeah, usernames would be a great improvement.

tptacek · on Dec 15, 2021

Usernames do not just work. The Signal team is not unaware of usernames and Signal is not a weird scheme to get all your phone numbers. The difference between Signal and systems that use usernames (or email addresses) is that Signal deliberately doesn't operate a serverside directory or buddy list service. By contrast, other relatively popular messengers essentially keep a plaintext database of who talks to who on their service.

What phone numbers allow Signal to do is to piggyback off the contact lists people already have on their devices.

stormbrew · on Dec 15, 2021

I have like 6 phone numbers listed in my phone contact list for my sister alone, because she changes numbers every time she job hops and gets a new work phone, so I really question how well phone numbers work either. Most of those numbers are no longer hers, but I've lost track of which ones.

I feel like there's a bubble of people for whom phone numbers are perhaps a useful, durable identity token, but I really think it's very much a bubble. Most people's phone numbers change fairly frequently.

rrdharan · on Dec 16, 2021

> Most people's phone numbers change fairly frequently.

Citation?

I think it’s just as easy to claim (and I certainly would bet that) most people’s phone numbers do not change frequently - I’d wager on something like 70% remain the same for 5+ years.

Curious / interested to be proven wrong though?

kreetx · on Dec 16, 2021

My phone numer has literally never changed (going 20+ years now). Perhaps it's different in the US somehow (e.g people use their work number for private purposes too), but in my experience private numbers tend to remain the same.

iamtheworstdev · on Dec 16, 2021

Not different here. My guess is that it only applies to people who never have a personal phone and only use work phones.

johnisgood · on Dec 15, 2021

Not only that, but how can I use Signal on desktop without it? Can I do that? Plus every single citizen here who has a phone number has it tied to their identity, even if they buy the SIM card at the gas station, for example. They have to verify and confirm their identity once a year. Its purpose is to keep track of who uses what number.

stormbrew · on Dec 15, 2021

You can use signal on a desktop but you still need to tie your account to a phone number. It has a mechanism to share your history and all that to a desktop app, with key certification managed (afaik) from your phone app (and it requires periodic re-pairing of it if you're not using it regularly). It's not really a great experience imo.

maxwell · on Dec 16, 2021

It looks like phone numbers are permanently deleted from Signal, seems problematic with re-provisioning.

A previous purchaser of a newly provisioned phone number may have already deleted their Signal account before releasing it.

https://support.signal.org/hc/en-us/articles/360007061192-De...

theshrike79 · on Dec 16, 2021

Over here you can keep your number when you change networks.

Very few people change their phone number ever.

stormbrew · on Dec 16, 2021

You can here too, I just know a lot of people who don't. And people who use their work phone (even if I think that's a bad idea) for personal use don't always have the option of keeping the number.

collaborative · on Dec 16, 2021

I operate an E2EE messaging service that uses email instead of phone numbers as user IDs

My servers also don't keep any record of conversations or "buddy lists". Yes, contact lists get wiped when changing device and email addresses aren't as easily stored in phone books (I don't even access a user's phone book for that matter). Just saying that it's possible to not use phone numbers. Granted not as convenient once you've built your entire user base off of their phone numbers

Email is more private though

johnisgood · on Dec 15, 2021

So the Signal server has no information about phone number X talking to or being in contact with phone number Y at all?

> By contrast, other relatively popular messengers essentially keep a plaintext database of who talks to who on their service.

I know, there are plenty of not-so-privacy-preserving messengers out there. The way Ricochet[1] and Briar[2] does it is probably the most privacy-preserving one, and it can be made extremely convenient.

[1] https://github.com/blueprint-freespeech/ricochet-refresh/blo...

[2] https://briarproject.org/how-it-works/

kitkat_new · on Dec 15, 2021

> is that Signal deliberately doesn't operate a serverside directory or buddy list service.

how do people again discover each other on Signal most of the time?

Anyways, nothing prevents Signal form creating it's own contact list within the app, perhaps bootstrapped from the existing one

tptacek · on Dec 15, 2021

They can do that, but then when you switch devices, you lose your contact list. That's not what happens with the built-in contact list.

This issue has been rehashed dozens of times on HN before (use the search bar below) and has basically nothing to do with the article.

kitkat_new · on Dec 15, 2021

actually, the contact list could include the signal identifiers

vintermann · on Dec 16, 2021

It's rather funny to me that I still get notifications that someone I knew ages ago (but are still in my contact list) are now on Signal.

I find it funny rather than scary because I knew the tradeoffs Signal decided on even before I started using it. The fact that you use Signal itself is not a secret. If you need that to be secret, use a burner phone or something else. But I'm pretty sure not everyone else knew that when they joined.

In one case, I got notification that a guy I knew in high school had joined Signal, he was pretty far left then, and googling him I found out he was extremely far left now (splinter of a splinter of an anti-electoral Maoist group). I sent him a friendly note welcoming him, and explaining the basic thing I've explained here, that everyone can see you joining signal.

I never got a reply.

maxwell · on Dec 16, 2021

Is there a notification if they leave Signal (delete the app or their account)?

Moru · on Dec 16, 2021

One way would be for Signal to create a hash of all your contacts phone numbers and look up if those hashes exists on the server. No contact details needed on the server, just the hash of the phone number connected to Signals user ID.

johnisgood · on Dec 16, 2021

Moxie talked about it here: https://signal.org/blog/contact-discovery/.

> The first instinct is often just to hash the contact information before sending it to the server. If the server has the SHA256 hash of every registered user, it can just check to see if those match any of the SHA256 hashes of contacts transmitted by a client.

> Unfortunately, this doesn’t work because the “preimage space” (the set of all possible hash inputs) is small enough to easily calculate a map of all possible hash inputs to hash outputs. There are only roughly 10^10 phone numbers, and while the set of all possible email addresses is less finite, it’s still not terribly great. Inverting these hashes is basically a straightforward dictionary attack. It’s not possible to “salt” the hashes, either (they always have to match), which makes building rainbow tables possible.

Regardless of this, Ricochet and Briar both use Tor hidden services and they are metadata-free. You do not send any metadata to any servers. I have a link to the design of Ricochet that is easily digestible in some of my other comments.

mariusor · on Dec 17, 2021

If only those contact lists would contain different things than phone numbers in them...

godelski · on Dec 16, 2021

Usernames isn't the solution to anonymity. Your username here isn't anonymous. And how do you share a username without connecting another identity? It's the same problem as using a phone number (supposing you're American and not too different if your ID is connected to your phone number).

johnisgood · on Dec 16, 2021

I would rather prefer doing it the Ricochet way, yes:

> The recipient can calculate the sender's contact ID based on the public key, and authenticate it by verifying the signature on the request. This proves that the sender can publish the hidden service represented by their contact ID.

This is using IDs. For convenience of sharing and receiving: could be copy pasted (share button works on smartphones, too), and QR codes could be used.

Ricochet is metadata-free, and it can (or is, actually) be resistant to traffic analysis, too. No one knows who you are, and no one knows who you talk to.

godelski · on Dec 17, 2021

Can you explain this a little more? Because the problem I see is that with any unique identifier it is hard to share in a hostile environment. Let's say I want to share my contact information to someone here but also share my contact information with someone on Reddit and not reveal that I am godelski on HN. Because if I use the identified "1234" here and "1234" on Reddit then I've connected those two accounts.

The only way I see it working out is with temporary identifiers or expiring links like FF send used to have. I saw some users talking about this in the community signal community form[0]. I agree with the users that are trying to say they want to use usernames to be anonymous, not as a way to hide a phone number.

[0] https://community.signalusers.org/t/usernames-in-signal

sam_lowry_ · on Dec 15, 2021

A login+password, like in IRC.

tptacek · on Dec 15, 2021

IRC tracks metadata serverside!

johnisgood · on Dec 15, 2021

I do not think that OP was referring to implementing it the same, or even in a similar way, but to use a username/password pair. OP is free to correct me if I am wrong though.

In any case, elimination of metadata done right is the way Ricochet[1] does it. The recipient can calculate the sender's contact ID based on the public key, and authenticate it by verifying the signature on the request. This proves that the sender can publish the hidden service represented by their contact ID. You can read more about it here: https://github.com/ricochet-im/ricochet/blob/master/doc/desi...!

[1] https://github.com/ricochet-im/ricochet (maintained fork: https://github.com/blueprint-freespeech/ricochet-refresh)

remus · on Dec 15, 2021

I don't know what their threat model is but it's interesting that they don't seem too bothered about reducing meta data collection potential on the server. I bet you could put together some pretty interesting graphs of who is talking to who, how much they talk and when.

pthatcherg · on Dec 15, 2021

Author here.

The server doesn't know who is in the call. It can't build any graph.

up6w6 · on Dec 16, 2021

I'm interested in a explanation for that too. Signal recently introduced the same sender key method for groups [1] and, correct me if I'm wrong, this is first formal mention of the "Selective Forwarding" method we have.

I get that Signal's open-source implementation doesn't store this type of metadata, but is very misleading to see the founder of Signal tweeting that "doesn't have access to the data" [2] while it's not that much better than Whatsapp in the technical side.

[1] https://community.signalusers.org/t/sender-key-feature/33599...

[2] https://twitter.com/moxie/status/1457005910136549379

codethief · on Dec 16, 2021

How does this work? I mean, I suppose the server must at least know the IP addresses of the participants?

pthatcherg · on Dec 16, 2021

Yes, it knows the IP addresses of the clients.

The other thing it knows is that you have provided a proof that you are a member of a the group conversation associated with the group call (basically a proof that you are allowed to join the call). And that proof is based on Signal's "zkgroup" system.

That's it. Once the server has the proof, it forwards packets for you. And it doesn't learning about you from then proof (other than what the proof proves: that you can join).

godelski · on Dec 16, 2021

But isn't knowing the IP a decently good identifier? I know signal is trying to be as trustless as possible, but since IPs don't change often can't this be used to deanonymize people relatively easily? I mean supposing the server became hostile/hijacked?

codethief · on Dec 16, 2021

This was already the case before group video calls and, yes, IPs can be used as identifiers. I don't think this can be prevented on a technical level[0] – you will always need to trust the server here. Signal does have a good track record, though: https://signal.org/bigbrother/

[0]: Unless you do p2p routing of video calls – with the known downsides regarding traffic, performance and NAT. But even then you would probably still need a server as rendez-vous point.

walrus01 · on Dec 16, 2021

DPI of the network traffic at the ISPs upstream of the server side (Whether ordered by US national security letter or other) would collect some very interesting data on ASN/IP traffic origins and time of day usage patterns, however.

Or even something much less than DPI and just high sampling rate netflow.

tptacek · on Dec 15, 2021

Their messaging substrate is Signal itself, for whatever that's worth, so at least the signaling component of the system should inherit the guarantees Signal already makes. But it's a good question.

kitkat_new · on Dec 15, 2021

Next step: decentralizing encrypted group calls [0]

[0]: https://2021.commcon.xyz/talks/extending-matrix-s-e2ee-calls...

a-dub · on Dec 15, 2021

decentralization would be interesting for the purposes of keeping operating costs down in some cases.

can imagine some sort of mode for just a few participants on mobile networks only where all the high bandwidth stuff remains fully peer to peer.

pthatcherg · on Dec 15, 2021

Author here.

We are considering making 2-person group calls switch to p2p under the hood. I don't know if that that will ever expand to 3 or more.

Arathorn · on Dec 16, 2021

pthatcherg: this is super cool; on the Matrix side we hadn't spotted that you'd written your own SFU. Would you have any objection to us trying to use it as a decentralised Matrix SFU, as per the design in https://github.com/matrix-org/matrix-doc/blob/matthew/group-...? (Obviously we'd do so within the constraints of AGPLv3).

pthatcherg · on Dec 16, 2021

It's open source. No objection from me.

a-dub · on Dec 16, 2021

cool! i suppose even with limited metadata with e2ee, you could probably do a quick study to get a sense for how many calls could switch (how many are 2 party, how many aren't behind nats, etc) and maybe even get a sense for how much could be saved in operation costs.

interesting privacy considerations now that i think of it though. p2p would trivially leak the existence of a call between two devices, but i suppose that given the right taps it would be possible to correlate through server streams based on timing data anyhow.

codethief · on Dec 16, 2021

> but i suppose that given the right taps it would be possible to correlate through server streams based on timing data anyhow

Probably, but it'd be much harder I suppose.

wolverine876 · on Dec 16, 2021

As someone who doesn't develop video codecs, the bandwidth consumed by sending multiple video resolutions seems a poor trade-off for the bandwidth saved by receiving a reduced resolution. (This isn't a criticism, but curiosity - I'm certain the developers thought of it.)

Is it possible (or is it already done) to compress the data by sending the lowest resolution in full, and then for each higher resolution, send only the delta between it and the one immediately beneath it (i.e., resolution layer 5 would contain only the delta between it and resolution layer 4)?

Perhaps because of E2EE the final output would have to be reconstituted on the fly client-side (i.e., because the server lacks visibility into the data), which might be asking too much. Then again, the client is generating all these resolutions.

TD-Linux · on Dec 16, 2021

The tradeoff makes sense mostly for many-party calls - for a 1:1 call sending multiple resolutions is a waste, but for a many participant call the downlink bandwidth usage becomes much larger than the uplink.

You layer idea is called SVC and is mentioned in the article. But VP8 doesn't support it, it is waiting on them to upgrade to VP9 or AV1.

wolverine876 · on Dec 16, 2021

> The tradeoff makes sense mostly for many-party calls - for a 1:1 call sending multiple resolutions is a waste, but for a many participant call the downlink bandwidth usage becomes much larger than the uplink.

Excellent insight, thanks.

> You layer idea is called SVC and is mentioned in the article. But VP8 doesn't support it, it is waiting on them to upgrade to VP9 or AV1.

Thanks. I noticed SVC in the article but somehow overlooked that detail.

kodablah · on Dec 15, 2021

I have been quite happy with some of my SFU code on top of the Rust wrapper for Mediasoup[0] which I understand from this post doesn't have congestion control. I would love to give this a try, but I fear including a library w/ AGPL license. The license choice is not worth hashing out here, and I'm sure Signal's goal isn't necessarily developer adoption compared to openness, but it will hurt adoption.

0 - https://docs.rs/mediasoup/latest/mediasoup/

pthatcherg · on Dec 15, 2021

Author here.

MediaSoup was one of the two SFUs that did seem have congestion control at the time. Sorry if that wasn't clear.

sneak · on Dec 15, 2021

They have the bandwidth for relaying video streams to 40 people but won't let me send full res jpegs in 1:1 messages?

And no, I can't just rebuild my client, because I'm on iOS and non-official builds won't receive push notifications from the official developers.

Vinnl · on Dec 15, 2021

That's not really related to this article, but I can select photo quality if I send a photo on Android. Appears to have been added in May.

sneak · on Dec 15, 2021

The article specifically mentions that they operate the infrastructure for relaying encrypted video streams for up to 40 participants.

I can also select media quality on iOS. My options are "compressed way too much" and "compressed too much". I assume you have the same options.

I would like to be able to attach images as files and have them come though unmodified. It is a general purpose communications tool, it should not be editorializing over my attachments.

I use Signal to communicate privately with my attorney. Why does anyone think tampering with evidence in transit is okay?

Apple also doesn't support open source in the App Store, so I can't fix the problem myself.

hiq · on Dec 15, 2021

> I would like to be able to attach images as files and have them come though unmodified. It is a general purpose communications tool, it should not be editorializing over my attachments.

It's a usability trade-off, it's better for a lot of people to compress these at least slightly. The main real problem is awareness, people have to know about this so that they can put their files into .zip instead of plain .jpg.

sneak · on Dec 15, 2021

They added the ability to make it recompress less, but did not include a "please do not alter my files" option.

collaborative · on Dec 16, 2021

The "tampering" of the image occurs on your device long before it ever enters transit

sneak · on Dec 16, 2021

Well, it happens inside a precompiled application which I cannot modify, and the file I selected is not the file that is received, so I am going to include "actions undertaken against my will by the unmodifiable application on my own device after I chose the file to send" as part of the "in transit" phase.

vlovich123 · on Dec 16, 2021

One thing that’s not clear to me (and maybe this has to do with how vp8 works?). In h264/h265 you have I frames (which are standalone) and P frames.

If I recall correctly, P frames require having decoded the immediately preceding frame or video corruption occurs until a new I frame is received.

With SFU selectively choosing which resolution stream to send a frame from next, how does this work? Is the client encoding each frame as an I frame? Is there feedback from the SFU to tell the client which stream its using so that the client knows to emit I frames (and thus generating a few corrupted frames until the stream is resynced)? Does vp8 work differently?

shams93 · on Dec 16, 2021

Skip the SFU and use media paging with the most recent 1.0 webrtc api you may not need an SFU, in fact the SFU could be a major budget destroyer when p2p meshes using participant paging is much better.

gfodor · on Dec 16, 2021

Can you elaborate on this?

Thaxll · on Dec 15, 2021

Looking at the loop, what prevent a slow client to block the iteration? I understand that it's UDP so fire and forget but shoudn't there be a thread pool to handle that in parallel?

pthatcherg · on Dec 15, 2021

Author here.

What is in the article is a very simplified version of the code. The code that runs in production uses many threads.

See here: https://github.com/signalapp/Signal-Calling-Service/blob/mai...

benlivengood · on Dec 15, 2021

To scale to thousands (is this even useful?) of e2e users build a tree of participants who can remix each other's video.

Pick a handy mixing ratio like 4:1 or 9:1 (a square helps, since they compose nicely if downscaled to a grid vs. active talker stays fullscreen) and nodes with the highest bandwidth and lowest latency take M-1 streams and add it to their own to make an M:1 mix which can be forwarded to a node closer to the root which produces another M:1 stream, and the root sends a single mixed stream down the tree until every participant has the mix. Max bandwidth at each node is M down and M up. Minimal spanning tree with max M edges per node recomputed as participants leave and join. Build 3 or 4 distinct trees and leave the connections open for more rapid switching if intermediate nodes stop participating.

wolverine876 · on Dec 16, 2021

> To scale to thousands (is this even useful?)

Think of a citizens' forum, or people watching a sporting event together.

wyager · on Dec 15, 2021

How does signal get money to cover costs of running compute-intensive services?

keewee7 · on Dec 15, 2021

One of the the WhatsApp founders, Brian Acton, donated $100 million to them as an unsecured loan due to be repaid in 2068:

https://en.wikipedia.org/wiki/Brian_Acton#Signal

https://en.wikipedia.org/wiki/Signal_(software)#Developers_a...

sorenjan · on Dec 15, 2021

How long does that last? Telegram uses a few hundred million dollars each year, although they are significantly larger.

> As Telegram approaches 500 million active users, many of you are asking the question – who is going to pay to support this growth? After all, more users mean more expenses for traffic and servers. A project of our size needs at least a few hundred million dollars per year to keep going.

https://t.me/durov/142

new_stranger · on Dec 15, 2021

> needs at least a few hundred million dollars per year to keep going

I'm pretty sure that is not server cost. This is probably the standard approach of companies hiring tons of personal and spending tens of thousands or hundreds of thousands on ads every single day.

sandstrom · on Dec 15, 2021

They recently added support for in-app donations: https://www.theverge.com/2021/12/2/22814934/signal-launches-...

I hope they'll take it a step further and require payment for certain functionality (maybe video calls?, or desktop client support?).

e12e · on Dec 16, 2021

Isn't it somewhat disconcerting that the sequence of media isn't protected by authenticated e2e? Essentially any delay and re-ordering is invisible?

tadeegan · on Dec 15, 2021

As always, incredible work signal team!

upcoming-sesame · on Dec 16, 2021

Challenge is to find 40 friends who use Signal.

*( I love signal)

BitPirate · on Dec 15, 2021

Are there any plans to add VP9 support?

pthatcherg · on Dec 15, 2021

Author here.

It's on the long list of things to do, yes. But we have many things before it on the list :).

KennyBlanken · on Dec 15, 2021

Hopefully proper client sync and cross-platform migration are on the list.

Two of the biggest issues for me with Signal have always been the incredibly poor client sync which I've heard is intentionally crippled to the point of being useless because of some hand-wave-y privacy reasons.

Also, the inability to move between Android and iOS. I lost years worth of conversations when I switched from Android. Worse, one day my Signal client on Android started crashing on startup. I emailed support@ with logcat files and never heard a peep. That lack of support for probably one of the most serious use issues one could have left a permanent bad taste in my mouth.

I don't care whatever paranoid argument someone at Signal has for why I can't sync my desktop client to my phone. Stop being paternalistic and let me decide if the tradeoff security-wise is worth it to me, please.

rvz · on Dec 16, 2021

This engineering post is all for nothing given that Signal is still a centralized messaging service and still requires your phone number to sign in/up which means users are still vulnerable to SIM swapping attacks.

Hence this, the service can be 'shutdown' at any time, making Signal even more uninteresting to use. (Probably affected by the Log4J RCE stuff)

So from me, No thanks and no deal.

DarylZero · on Dec 17, 2021

In theory the code could be made reusable, but it probably was structured in a very much un-reusable way and will never be available to other projects.

Everything has to be some giant monolithic integration because if you engineered it properly as a reusable library, it wouldn't provide any business benefit to the authors. It's almost a form of closed source by obfuscation through over-integrated design.

1vuio0pswjnm7 · on Dec 15, 2021

"Full mesh: Each call participant sends its media (audio and video) directly to each other call participant. This works for very small calls, but does not scale to many participants. Most people just don't have an Internet connection fast enough to send 40 copies of their video at the same time.

Server mixing: Each call participant sends its media to a server. The server "mixes" the media together and sends it to each participant. This works with many participants, but is not compatible with end-to-end encryption because it requires that the server be able to view and alter the media.

Selective Forwarding: Each participant sends its media to a server. The server "forwards" the media to other participants without viewing or altering it. This works with many participants, and is compatible with end-to-end-encryption."

Imagine an end user who is interested in "very small calls" with friends and family. She is not interested in communicating to an infinitely large audience ("broadcasting"). She never has group calls on Signal with 40 people, even ones with Grandmother and full extended family are well under 40 people. We have to use our imagination because this user does not actually exist.

The imaginary user reads this blog post and she thinks to herself "Full mesh sounds like the best design. There is less/no reliance on a third party, traffic does not need to be sent to an additional third party server." With full mesh, there is no need to mention the caveat "without viewing or altering it", or selectively choosing not to forward it to certain recipients. Full mesh seems to give the user the most control and require the least dependence on third party servers (not necessarily "no dependence", but the least dependance).

Then she reads this line: "Because Signal must have end-to-end encryption and scale to many participants, we use selective forwarding."

prophesi · on Dec 15, 2021

Just an FYI full mesh would still require communicating with a third-party server, at the very least for initial networking when joining/leaving a group call.

The whole point of E2E encryption is so that passing data through a third party shouldn't matter in the first place.

And lastly, even when you have just a 1:1 video chat, sending and receiving full resolution/quality multimedia can still be way too much for some peoples' internet connections. UX is extremely important for Signal, as unreliable video chat is a surefire way for those less caring about privacy to hop back over to a privacy-violating alternative.

I feel sorry for those working on bringing security/privacy to everyone, as they have to appease power users and privacy absolutists, along with one's grandmother and the TikTok generation.

pthatcherg · on Dec 15, 2021

Author here.

I guess if there is enough interest from users, we could look into adding a full mesh mode. It's been a year so far with group calls and no one has asked for it, probably because there isn't really much to be gained. Even 1:1 calls, which are p2p, sometimes need to be relayed through a TURN server.

1vuio0pswjnm7 · on Dec 16, 2021

"Even 1:1 calls, which are p2p, sometimes need to be relayed through a TURN server."

What percentage is "sometimes". For example, maybe it is only where two callers are behind the same NAT or some other situation that could be rare relative to the majority of calls.

In any event, it sounds like TURN is not used unless it is needed. The same idea could apply with server forwarding, where forwarding is only used if it is needed. For a small group call, forwarding might not be necessary and full mesh might suffice.

As one user, I would be interested in a full mesh option alongside a server forwarding option.

There is P2P software for small groups I have always used that has a traffic forwarding (relaying) option after the peers are connected, but the default is direct connection, with no need for a relay. The forwarding option is useful for situations where it is needed, but it is not the default. Personally, I have never had to use it.

pthatcherg · on Dec 16, 2021

We don't have a percentage because for 1:1 calls, we don't even know how many calls there are (it's completely p2p), but I've ofte heard quoted somewhere between 5-10% of calls.

Yes, TURN is only used if needed in those cases, and, yes, you could use a forwarding server instead of a TURN server (although with different tradeoffs).

I'll add the full mesh option to our list of options for future work that we discuss regularly.

DarylZero · on Dec 17, 2021

> We don't have a percentage because for 1:1 calls, we don't even know how many calls there are (it's completely p2p)

Don't the clients have to coordinate through the server to share IPs? (One side at least.)

pthatcherg · on Dec 17, 2021

Yes, that's called "signaling". In the case of Signal, it's done through Signal messages, which pass through a server. But the server doesn't know they are calling related messages. They could be any message.