It sounds like this would have taken off if it were added to various managed clo...

toast0 · 2024-04-20T01:34:19 1713576859

I think it's a no brainer if it's no effort or small effort (set a socket option on the client, somehow)... but it's a big effort to support it in a large load balancing situation.

If you balance your load balancers with ECMP, I don't know if you can get two client streams to the same mptcp terminating place.

If you've optimized the heck out of your tcp flows, this throws a wrench in there, because the second stream is likely to get hashed into a different nic queue, and then you have communication between cpus to move forward on the logical stream.

It would have been really handy though, and solve real issues with real users.

Edit to add: it could also solve some issues on private networking / interserver networking I saw... although the contention would be a much bigger problem on higher bandwidth streams. On networks with link aggregation, while there are many paths from one host to another, usually path selection is by hashing the connection 5-tuple {src ip, dst ip, protocol, src port, dst port} so a long running tcp connection remains on the same path for the duration, if a path segment has high loss/corruption or is congested, MPTCP could help if you had an extra connection that hit a different path. Otherwise, you need to find the segment and get network operations to fix it; it's not easy to figure that out (i had to write a tool to sample and find port combinations with trouble and then a patch for mtr to run a trace with fixed ports) and then you still need to reconnect your affected tcp sockets unless you can get a quick response from net ops (sometimes they can check error stats once the right devices are pointed out to them, and then replacing a cable/fiber often helps, or disconnecting it during investigation can help the traffic flow across the redundant links)

vitus · 2024-04-20T02:51:12 1713581472

> If you balance your load balancers with ECMP, I don't know if you can get two client streams to the same mptcp terminating place.

At Google, we do something similar with QUIC and connection migration. Our mechanism for ensuring these hit the same backend is Maglev [0], where we use the QUIC connection ID for hashing purposes in software. (Our routers still mostly use ECMP based on the 5-tuple, so being able to consistently hash to the same backend across multiple LB instances is crucial.)

> if a path segment has high loss/corruption or is congested, MPTCP could help if you had an extra connection that hit a different path.

Incidentally, we also have a family of internal mechanisms that do this, although we don't rely on MPTCP. (We instead twiddle some other bits in the packet that we make sure our routers use for hashing, at least for RPCs between prod machines.) This inspired some of the connection migration work in our QUIC implementation [1], wherein we can migrate to a different ephemeral port if we detect issues with the current path. This works shockingly often for routing around network problems.

[0] https://research.google/pubs/maglev-a-fast-and-reliable-soft...

[1] https://github.com/google/quiche/blob/main/quiche/quic/core/...