The protocol has to be designed in a way that facilitates (or, at least, does no...

MichaelGG · on April 23, 2014

Line folding doesn't really help manually interact or craft messages. Nor do comments. There's no real value in being able to do:

  Foo:  ValueBegin (this is a comment)
    ValueEnds

Versus requiring Foo: ValueBegin ValueEnds

As far as performance, I must say you're incorrect. Review the nginx HTTP parser and you'll see all sorts of bitwise hacks in order to improve performance. This aligns with my own experience writing a packet capture system for a similar protocol.

Having open syntax, comments and line folding being part of the problem, incredibly complicates the parser. Other moronic things are the completely arbitrary handling of header fields. Some header fields allow their value to be split over multiple lines but must treat it as if it was one line. Others use multiple headers to provide some multi-line value. There's no simple parsing, it all must be sensitive to the context. This is just stupid, yet free-text protocol authors revel in it. SIP even publishes a "torture test" RFC where they're just oh-so-pleased with the edge cases their moronic spec allows. They even suggest a parser should guess as to the message sender's intent.

I'd also note that in some cases the parser is most of the stack (simple proxy scenarios, frontend security). Regardless, the fact that the rest of the stack may be complicated is not in any way an excuse for making the parser worse. This is not a programming language.

rbanffy · on April 23, 2014

> you'll see all sorts of bitwise hacks in order to improve performance.

How much time does nginx spend parsing HTTP requests compared to waiting for network or disk IO?

> SIP even publishes a "torture test" RFC

This is actually very clever. I wish other protocols had something like it to help weed out partial and buggy implementations.

> They even suggest a parser should guess as to the message sender's intent.

This may be a little bit too much

> This is not a programming language.

A lot of incredibly powerful uses for technology come precisely from the unintended scenarios - the clever ways to abuse technology and force it to do something it was never intended to. Remember HTTP itself was conceived to do a tiny subset of what it does now.

MichaelGG · on April 23, 2014

>How much time does nginx spend parsing HTTP requests compared to waiting for network or disk IO?

Enough that they choose to use much more obtuse code? Not to mention network/IO scale independently so it's not a relevant comparison. I wrote a packet capture system that spent about 70% of its capture/index CPU budget on parsing.

A torture test is only clever if it's not absurd because of a thousand edge cases. Then it just reflects the stupidity in the protocol design.

Do you have any possible use of terrible parsing rules that encourage security holes? HTTP is commonly used outside browsers because it's a simple wrapper for a TCP-RPC model. Request something, get a response. And none of the features depend on the stupid parsing rules - absolutely no one and nothing benefits from that. Except perhaps contractors billing hourly.

rbanffy · on April 24, 2014

> Enough that they choose to use much more obtuse code?

They certainly had it profiled before they optimized it. There must be some numbers somewhere. "Enough" is hardly an acceptable answer.