Protobuf is a great format with a lot of benefits, but it's missing one that I w...

squirrellous · 2025-12-02T12:01:56 1764676916

I don’t understand this argument. It seems to originate from capnp’s marketing. Capnp is great, but the fact that protobuf can’t do zero copy should be more an academic issue than practical. Applications that want to use a schema always needs their own native types that serialize and deserialize from binary formats. For protobuf you either bring your own or use the generated type. For capnp you have to bring your own. So a fair comparison of serialization cost would compare:

native > pb binary > native

vs

native > capnp binary > native

If you benchmark this, the two formats are very close. Exact perf depends on payload. Additionally, one could write their own protobuf serializer with protoc they really need to.

pornel · 2025-12-02T05:04:08 1764651848

It depends how you actually use the messages. Zero-copy can be slowing things down. Copying within L1 cache is ~free, but operating on needlessly dynamic or suboptimal data structures can add overheads everywhere they're used.

To actually literally avoid any copying, you'd have to directly use the messages in their on-the-wire format as your in-memory data representation. If you have to read them many times, the extra cost of dynamic getters can add up (the format may cost you extra pointer chasing, unnecessary dynamic offsets, redundant validation checks and conditional fallbacks for defaults, even if the wire format is relatively static and uncompressed). It can also be limiting, especially if you need to mutate variable-length data (it's easy to serialize when only appending).

In practice, you'll probably copy data once from your preferred in-memory data structures to the messages when constructing them. When you need to read messages multiple times at the receiving end, or merge with some other data, you'll probably copy them into dedicated native data structs too.

If you change the problem from zero-copy to one-copy, it opens up many other possibilities for optimization of (de)serialization, and doesn't keep your program tightly coupled to the serialization framework.

jonny_eh · 2025-12-02T00:24:01 1764635041

Is that a format/serialization issue, or library/implementation issue?

ElectricalUnion · 2025-12-02T03:14:28 1764645268

Serialization issue. From the Introduction to Cap’n Proto:

"Cap’n Proto is INFINITY TIMES faster than Protocol Buffers. (...) there is no encoding/decoding step. The Cap’n Proto encoding is appropriate both as a data interchange format and an in-memory representation, so once your structure is built, you can simply write the bytes straight out".

I take it as a rationalization of what OLE Compound File Binary - internal Microsoft Office memory structures serialized "raw" as file format - would look like if they paid more attention to being backward and forward compatible and extensible.

TillE · 2025-12-02T04:40:55 1764650455

Google has a library/format for that too, with FlatBuffers. Different use cases and advantages really, not clearly better/worse.

kragen · 2025-12-02T10:43:05 1764672185

Kenton Varda also worked on Protobufs at Google before he wrote CapnProto, I think.

__s · 2025-12-02T03:09:55 1764644995

Format: https://news.ycombinator.com/item?id=23589117