The Robustness Principle Reconsidered (2011)

mcguire · on March 28, 2020

"This also clarifies what to do when passing on a packet—implementations should not clear field X, even though that is the most "conservative" thing to do, because that would break the case of a version 1 implementation forwarding a packet between two version 2 implementations. In this case the Robustness Principle must include a corollary: implementations should silently ignore and pass on anything that they don't understand. In other words, there are two definitions of "conservative" that are in direct conflict."

Clearing the field is not 'the most "conservative" thing to do'. "Conservative" in the context of network protocols means not fooling with things you don't understand and don't need to fool with. Yet, this is a very common misunderstanding, by both the specifier and the implementer. Likewise,

"Now let's suppose that our mythical standard has another field Y that is intended for future use—that is, in a protocol extension. There are many ways to describe such fields, but common examples are to label them "reserved" or "must be zero." The former doesn't say what value a compliant implementation should use to initialize reserved fields, whereas the latter does, but it is usually assumed that zero is a good initializer. Applying the Robustness Principle makes it easy to see that when version 3 of the protocol is released using field Y there will be no problem, since all older implementations will be sending zero in that field."

"Must be zero" should probably be stricken from the protocol lexicon. If a field is unused, an implementation should not be looking at it at all. Instead, add wording such that all fields are zeroed before individual values are assigned.

"The problem occurs in how far to go. It's probably reasonable not to verify that the "must be zero" fields that you don't have to interpret are actually zero—that is, you can treat them as undefined. As a real-world example, the SMTP specification says that implementations must allow lines of up to 998 characters, but many implementations allow arbitrary lengths; accepting longer lines is probably OK (and in fact longer lines often occur because a lot of software transmits paragraphs as single lines)."

But this is exactly the issue that causes attacks on the robustness principle to fall down. A server passing along a message with lines longer than 998 characters to an implementation that doesn't handle them is a buffer overflow waiting to happen. It's only considered a reasonable exception because it hasn't caused problems, not because it cannot. On the other hand,

"Many Web browsers have generally been willing to accept improper HTML (notably, many browsers accept pages that lack closing tags). This can lead to rendering ambiguities (just where does that closing tag belong, anyhow?), but is so common that the improper form has become a de facto standard—which makes building any nontrivial Web page a nightmare. This has been referred to as "specification rot."."

Would it have been possible for the HTTP/HTML environment to take off in the 1990s if browsers had not been very liberal in what they accepted? All of those Geocities pages were probably misformatted. HTTP servers, by definition, don't care what they're serving, so any errors would only appear in the browser, and probably not the browser of the page's author.

(By the way, back when HTML was an SGML standard, SGML supported and encouraged omitting unnecessary closing tags. HTML took advantage of this flexibility.)