Streaming in this context means that the parsing code hands off some parsed structures to the code outside of the parser before the entire message is processed. Suppose now you have this message:
{ x: 1, y: 2, x: 3 }
The streaming parser then reads field "x" with the value of "1", dispatches that value to the outside code, then reads "y", then reads "x" again, but the outside code had a side-effect associated with that field, which it already performed (eg. the outside code was printing the value of the field). Now, you have a program with an error. The right output should've been:
y: 2
x: 3
but you got:
x: 1
y: 2
x: 3
Might not be so bad, depending on circumstances, but if you are betting on a soccer game outcome...
You could easily design a stream parser that rejects duplicates before it passes them off to the user, by maintaining a set of already encountered keys within the parser state. The space overhead isn't a concern unless your map/set has millions of entries, but your non-streaming parser would have choked from the memory usage long before then, anyways.
> You could easily design a stream parser that rejects duplicates before it passes them off to the user, by maintaining a set of already encountered keys within the parser state.
You could, but you are not allowed to. Protobuf parsing requires that the last duplicate key wins.
I see. But if this ambiguous repetition must be resolved, then it must be resolved either at input or output time. Protobuf seems to have optimized for the output case by allowing for updates to scalar fields by appending.
It doesn't need to be resolved at input time. Protobuf wire format allows repetition. If we want to be more pedantic, Protobuf wire format doesn't have a concept of dictionaries / hash-tables however you want to call them. It only has lists. What IML defines as "map" is, in fact, a list of key-value pairs, so there's no problem with repetition on the technical level.
However, the interpretation of the payload does require removing repetition, and in a bad way: the compliant implementation must remove all but the last matching key-value pair.
It's just plain stupid. But this stupidity isn't unique to Protobuf. There are plenty of stupid formats like this one. For example VHD (Microsoft's VM image format) puts some information necessary to interpret the image in what they call "footer" (i.e. at the end of the file). MOV (Mac QuickTime videos) and at least early versions of MP4 created on Macs used to put some metadata in the end of the file, making it impossible to stream them.
Unfortunately, it happens a lot that people design binary formats for the first time, and then the format succeeds for unrelated reasons. We have tons of bad but popular formats. PNG is up there at the top of the list together with HTML and PDF. Oh, and don't mention PSD -- that one would probably take the cake when it comes to the product of how bad it is and how popular it is...
It sounds like you have decided this design is "stupid" because you don't understand the motivation. This feature allows a proxy to alter a protobuf message without decoding it. That is a significant efficiency win in some important cases.