I think IP has two main issues: fragmentation and TCP. Possibility of fragmentat...

vlovich123 · 2024-09-05T16:26:09 1725553569

> It would've been great if IP supported a reliable way to do MTU discovery (e.g. by replying with a special "error" packet containing MTU size)

That’s literally how IP works. The problem is that some middleware boxes block the ICMP error packet that has the MTU so most people don’t see the don’t fragment IPv4 bit. IPv6 gets rid of the bit since there’s not even any fragmentation support (but the error packet remains). To combat this, DPLPMTUD was developed for QUIC where probe packets are used instead to discover the MTU at the cost of global efficiency.

newpavlov · 2024-09-05T16:40:49 1725554449

Yes, you are right. I completely forgot about ICMP Packet Too Big messages since they are effectively not used in practice for application-level stuff. One minor difference between it and the behavior desired by me is that the error packet would be sent directly to an application port (one of the reasons why it would be nice for IP packets to contain source/destination ports), making it possible to conveniently process it. IIUC even assuming PTB messages are reliably delivered, we would need some kind of OS help (and additional APIs) to process them properly.

cryptonector · 2024-09-05T16:49:47 1725554987

MTU discovery is a solved problem (solved twice).

The problem with IP is not the problem with IP, but a problem with middleware that makes deployment of new protocols above IP practically impossible. That's why QUIC rides over UDP rather than being a first class transport protocol.

Veserv · 2024-09-05T18:32:48 1725561168

You can actually do better by just having network hardware truncate and forward instead of drop when encountering a packet that is too large. Then the receiver can detect every packet that went through a route with a small MTU and know, precisely, the MTU of the route the packet went through. The receiver can then tell the sender, at either a system level or application level, of the discovered MTU via a conservatively sized message.

This allows you to easily detect MTU changes due to changed routing. You can easily determine or re-determine the correct MTU with a single large probe packet. You can feedback application-level packet loss information as part of the "packet too big" message which is useful for not screwing up your flow control. The application can bulk classify and bulk feedback the "lost" packets so you get more precise, complete, and accurate feedback that also costs less bytes. The network hardware can be dumber since it does not need to send feedback messages (for this case) which also results in the pipe being closer to a "dumb pipe" as a whole. Basically the only downside is that you forward "dead" traffic resulting in unnecessary congestion, but you wanted to send that traffic anyways (and would have succeeded if the predicted MTU was correct), so that congestion was already assumed in your traffic pattern.

newpavlov · 2024-09-05T19:17:57 1725563877

I think truncating packets would make live of application developers significantly harder since it breaks the nice atomic property and requires development of ad-hoc feedback protocols which account for potentially truncated packets. Smaller MTUs are also more likely on the last mile, so dropping a packet and sending an error packet back would result in less "useless" global traffic.

I guess it may be both, IP packet header could contain a flag depending on which data will be either truncated, or dropped with an error packet sent back.

Veserv · 2024-09-05T20:18:34 1725567514

I was using "application" level to mean the consumer of the raw packet stream which would generally be a transport protocol (i.e. TCP, UDP, QUIC, etc.), not a actual user application.

Truncation is trivial to support at the transport protocol level. UDP, TCP, and literally every other network protocol I can think of already encode (or assume) the "expected" non-truncated size, so you just compare that against the size reported by the hardware descriptors and either drop (which is what already always happens, just earlier in the chain) or the protocol can be enhanced to generate feedback. Protocols literally already do that check (for security reasons to avoid attacker-controlled lengths), so changing network hardware to truncate would likely already work with existing protocol stacks. Any enhancements for smarter behavior than just "drop if truncation detected" would only need to be at the transport protocol/endpoint level, so would be relatively easy to bolt-on without making any other changes to the intervening network hardware and while being compatible with "default drop" behavior for anybody who does not want the extra complexity.