Protobuf is a great format with a lot of benefits, but it's missing one that I wish it could support: zero-copy. The ability to transport data between processes, services and languages with effectively zero time spent on serialization and deserialization.
It appears possible in some cases but it's not universally the case. Which means that similar binary transport formats that do support zero-copy, like Cap'n Proto, offer most or all of the perks described in this post, with the addition of ensuring that serialization and deserialization are not a bottleneck when passing data between processes.
I don’t understand this argument. It seems to originate from capnp’s marketing. Capnp is great, but the fact that protobuf can’t do zero copy should be more an academic issue than practical. Applications that want to use a schema always needs their own native types that serialize and deserialize from binary formats. For protobuf you either bring your own or use the generated type. For capnp you have to bring your own. So a fair comparison of serialization cost would compare:
native > pb binary > native
vs
native > capnp binary > native
If you benchmark this, the two formats are very close. Exact perf depends on payload. Additionally, one could write their own protobuf serializer with protoc they really need to.
It depends how you actually use the messages. Zero-copy can be slowing things down. Copying within L1 cache is ~free, but operating on needlessly dynamic or suboptimal data structures can add overheads everywhere they're used.
To actually literally avoid any copying, you'd have to directly use the messages in their on-the-wire format as your in-memory data representation. If you have to read them many times, the extra cost of dynamic getters can add up (the format may cost you extra pointer chasing, unnecessary dynamic offsets, redundant validation checks and conditional fallbacks for defaults, even if the wire format is relatively static and uncompressed). It can also be limiting, especially if you need to mutate variable-length data (it's easy to serialize when only appending).
In practice, you'll probably copy data once from your preferred in-memory data structures to the messages when constructing them. When you need to read messages multiple times at the receiving end, or merge with some other data, you'll probably copy them into dedicated native data structs too.
If you change the problem from zero-copy to one-copy, it opens up many other possibilities for optimization of (de)serialization, and doesn't keep your program tightly coupled to the serialization framework.
Serialization issue. From the Introduction to Cap’n Proto:
"Cap’n Proto is INFINITY TIMES faster than Protocol Buffers. (...) there is no encoding/decoding step. The Cap’n Proto encoding is appropriate both as a data interchange format and an in-memory representation, so once your structure is built, you can simply write the bytes straight out".
I take it as a rationalization of what OLE Compound File Binary - internal Microsoft Office memory structures serialized "raw" as file format - would look like if they paid more attention to being backward and forward compatible and extensible.
It appears possible in some cases but it's not universally the case. Which means that similar binary transport formats that do support zero-copy, like Cap'n Proto, offer most or all of the perks described in this post, with the addition of ensuring that serialization and deserialization are not a bottleneck when passing data between processes.