While it's true that XML is a generic serialization of angle-bracket markup that doesn't need a schema for serialization (which was exactly the main motivation for its "invention" as a SGML subset), the reason it's being used in inter-party or long-term/loosely-coupled service payload serialization is because it has fairly powerful and well-established schema languages (XML DTDs and XML Schema) for validation. This is unlike JSON which thrives in ad-hoc serialization for tightly-coupled back-ends and web front-ends (and only in that scenario IMHO). So I don't think XML belongs into the "schema-less, copying" category.
Very true. IMO many people that hate XML config files just haven't used an IDE that validates schema. Its super nice to have auto-complete and property validation on config files, something not offered by JSON or YAML. A good reason to stick with XML for complicated configs.
Its one of reasons I don't mind maven. Yeah there's 1000+ line XML config file, but maven DTD is so tight nearly any syntax issue will be flagged. Something easy to appreciate when you're used to giant config files that don't get validated until runtime.
Visual studio does schema validation for JSON and gives errors inline. There is a big list of supported schemas built in and you can define your own. Never actually looked at the format personally.
While I generally agree, I'm not sure maven's pom.xml (aka porn.xml), of all things, is a paragon of good markup design ;) For one, maven actively forbids/rejects use of ordinary XML entity references, and invents its own text expansion instead (so strictly speaking pom.xml isn't even XML proper). Then using XML for a relatively simple EAV format seems like overkill. But yeah, over a decade ago the maven developers had great plans to open up the format for alternative serializations/DSLs; thankfully they didn't, if only because they realized it isn't worth the maintenance effort. I'll add that a format for describing software builds is probably the wrong place to let a thousand blossoms bloom, something that you realize soon enough if you've worked as freelancer in Java-ish project's for any amount of time, and where every other project is feeling the urge to use the oh-so-great gradle as an alternative to pom.xml.
My comment was only aimed at service payload serialization; as to whether markup makes for a good config file format, I'm not entirely sure. It certainly is better than inventing an ad-hoc format IMO, but OTOH there are a couple of not-quite-SGML/XML config formats such as Apache's httpd.conf, or in fact maven's that just give XML/SGML a bad name IMHO, because they generally inherit the downsides of markup without bringing its benefits (such as being able to assemble a configuration from fragments using regular entity expansion).
Or even schema-for-type, schema-for-value-validation
Just as an aside. The new version of OpenAPI (v3.1 RC) for REST interfaces now fully supports the json-schema validation mechanism so we may see an uptake in use of json schema.
We currently use this to store security log data, but think it's an interesting midpoint between having no schema at all vs requiring schema registries to do useful work.
As far as I know, SBE was designed with financial protocols in mind, though possiblity more order entry (FIX-SBE) than market data. I'm not saying it is the case but it's possible that it's more suited to the IEX market data that the article uses for the performance test than capnproto or flatbuffers.
I will consider it in future though, while I'm familiar with SBE it's not one I'd have thought of when thinking about serialisation.
SBE is a zero-copy schema-full serialization format. I don't think there's anything which limits the format to the financial domain. For example, here's [1] a toy example of a schema describing a car.
MessagePack/msgpack is great for that middle ground where JSON ser/des is too slow, but you don't have enough engineers to justify the maintenance burden of a heavier-weight schemaful protocol.
I did end up writing a simple schema verifier for Ruby (ClassyHash, on GitHub) in one of the jobs where I used msgpack, but I no longer have access to maintain it. My benchmarks showed msgpack+classyhash was faster than native JSON (didn't test oj I think) and other serialization formats, and faster than all the other popular Ruby schema validators at the time.
Tldr: msgpack rocks, use it instead of JSON for internal services
Copying is more a facet of the implementation than the architecture, and relates strongly to the language and runtime. There's no reason that protobuf needs to copy. The only reason most C++ protobuf libraries copy is because ownership in C++ is hard and that makes zero-copy hard to use safely. By contrast it's easier to write a protobuf codec in Go that just aliases everything, because the Go runtime keeps any referenced buffer alive and deletes it when it's not referenced. In any case, it's always been possible to have zero-copy protobuf, you just don't get that from the "Hello, World!" protobuf tutorial.
This is incorrect -- it is not possible to implement Protobuf in a way that achieves the notion of "zero-copy" that Cap'n Proto and FlatBuffers achieve.
This probably comes down to a disagreement on what "zero-copy" means.
Some people use the term "zero-copy" to mean only that when the message contains a string or byte array, the parsed representation of those specific fields will point back into the original message buffer, rather than having to allocate a copy of the bytes at parse time.
Cap'n Proto and FlatBuffers implement a much stronger form of zero-copy. With them, it's not just strings and byte buffers that are zero-copy, it's the entire data structure. With these systems, once you have the bytes of a message mapped into memory, you do not need to do any "parse" step at all before you start using the message.
For example, if you have a multi-gigabyte file formatted with one of these, you can mmap() it, and then you can traverse the message tree to read any bytes of the message except the chain of pointers (parent to child) leading to that one datum. Aside from the mmap() call, you can do all this without even allocating any memory at all.
That is absolutely not possible with Protobuf, because Protobuf encoding is a list of tag-value pairs each of which has variable width. In order to read any particular value, you must, at the very least, linearly scan through the tag-values until you find the one you want. But in practice, you usually want to read more than one value, at which point the only way to avoid O(n^2) time complexity while keeping things sane is to parse the entire message tree into a different set of in-memory data structures allocated on the heap.
That is not "zero-copy" by Cap'n Proto's definition.
(Disclosure: I am the author of Cap'n Proto and Protobuf v2.)
Nothing of your comment was false but there's no intrinsic value to your stronger definition of zero-copy. A direct mapping to memory is not always optimal for performance. Indeed, there are high-performance computation packages that compress data structures in L1-cached-sized blocks, to save main memory bandwidth. So, you've used the word "achieve" to decorate an outcome that might not be optimal.
By the way I worked on protobuf performance at Google for years and we could never get flatbuffers to go any faster.
> no intrinsic value to your stronger definition of zero-copy.
Whoa, that's a very strong statement. But then the rest of the paragraph gets a lot weaker.
> there are high-performance computation packages that compress data structures in L1-cached-sized blocks
This seems like a non sequitur. Of course hand-tuned data structures can achieve higher performance than any serialization framework, but what does that have to do with zero-copy vs. protobuf? Are you suggesting that protobuf encoding would be a good choice for these people?
> So, you've used the word "achieve" to decorate an outcome that might not be optimal.
I'm not sure why "achieve" would imply "optimal". Of course whether this is an advantage depends on the use case.
There are many cases where zero-copy doesn't provide any real advantages. If you're just sending messages over a standard network socket, then yeah, zero-copy probably isn't going to make things faster. There are already several copies inherent in network communication.
But if you have a huge protobuf file on disk and you want to read one field in the middle of it, that's just not something you can do in any sort of efficient way. With zero-copy, you can do this trivially with mmap().
Or if you're communicating over shared memory between processes on the same machine, then the entire serialize/parse round trip required with protobuf is 100% waste. Zero-copy would let you build and consume the structure from the same memory pages.
These seem "intrinsically valuable"?
> we could never get flatbuffers to go any faster.
What use case were you testing? Did you test any zero-copy serializations other than flatbuffers?
I've heard from lots of people that say Cap'n Proto beat Protobuf in their tests... but it definitely depends on the use case.
> Or if you're communicating over shared memory between processes on the same machine, then the entire serialize/parse round trip required with protobuf is 100% waste.
This is the part I don’t agree with. What I’m saying there is value in using encoded structures not only between servers and not only over shared memory but even within a single process. Yes, you discard the ability to just jump to any random field, but that is not always important. Often it can be better to spend some compute cycles and L1 accesses to save main memory accesses. If you are having to make full access to some kind of data anyway, then packing it makes a ton of sense. Consider any kind of delta-encoded column of values ... you can’t seek within it, but if the deltas are smaller than the absolutes, this can save massive amounts of main memory bandwidth. This is why I argue that representing something as a C struct in main memory is not obviously advantageous, outside some given workloads.
As for flatbuf at google I’m sure you’re aware that the only way to get the kind of mindshare you’d need to ship it would be to make websearch measurably faster.
OK, so we're talking about a use case where you're compressing data in main memory and trying to decompress it only within L1 cache. I guess there must be a lot of data sitting around in RAM that isn't accessed very often. Search index leaf nodes I suppose?
It doesn't seem to me like Protobuf is ideal for this use case, but sure, I see how the light compression afforded by Protobuf encoding could lead to a win vs. bare C structs.
I think a better answer here, though, would be to use an actual compression algorithm that has been tuned for this purpose.
Of course, then the uncompressed data needs to be position-independent, so no (native) pointers. You could use something hand-rolled here... but also, this is exactly the problem zero-copy serializations solve, so they might be a good fit here. Hmm!
I'd be pretty interested to compare layering a zero-copy serialization on top of compression vs. protobuf encoding in these high-performance computing scenarios. Is that something you tried doing?
> Yes, you discard the ability to just jump to any random field, but that is not always important.
I don't think this is the criticism being raised.
The main criticism being raised against non-zero-copy serialization is that this often requires maintaining different memory representations for the same value - the copies are the consequence of transforming from one representation to another one.
We do that all the time in high-performance computing. You keep a packed representation in memory and unpack it in small pieces to operate on it. Sparse matrices, compressed columns, etc. This is not evil, it’s an adaptation to the way the machine works. Saying that Kenton’s definition of zero-copy is unconditionally better is an aesthetic argument and I don’t buy it.
They are not talking about "unpacking on the fly for processing", but rather about "unpacking on memory to be able to call an opaque API outside your control that expects the unpacked representation". That requires copying in-memory to interface with that API.
Your approach only works if you are willing to "re-implement the world" to interface with whatever packed format suits your application.
With zero-copy serialization you don't have to do that.
There is a big difference to me. I haven't used any of those systems but I have written plenty of console games where the artists and designers want to fill memory. That means I don't have memory for 2 representations, unparsed and parsed. I also want fast loading so loading say 4k at a time into some temp buffer and parsing into memory is also out. I load the file directly into memory, fix up the pointers, and use it in place.
Not-Faster on what platform? In Borg, in Google3 code, deployed on a fast machine with a nice fast wide memory bus and a large cache?
What about in embedded code, or in a game? A place where memory bandwidth is scarce, or where we're trying desperately to reduce the number of syscalls and jumps back and forth between kernel and user space?
Having the entire payload memory mapped, and copies avoided, makes an absolutely huge difference once these kinds of concerns are real. Having something mmap'd in theory means it's nice and hot and fresh in the kernel's mind, and potentially kept in cache.
Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship. Using something like flatbuffers would have made a lot more sense.
At Google protobuf is the veritable "once I have a hammer everything looks like a nail", to the point where I've seen protobufs in JSON payloads, or vice versa, or even deeper... protobof in JSON inside a Chromium Mojo payload... because, well, how good could it be without protobuf?
> Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship.
I shipped an embedded project using an RTOS and < 256kB RAM using protocol buffers (even nested ones) and zero-copy deserialization for byte arrays some time ago. We used nanopb, and it worked just fine if you understand how it works - although it certainly is less pleasant to work with than a fuller implementation which would just have copied out the internal bytes into new arrays and not have let us deal with lots of internal pointers into byte arrays.
Overall using Protocol Buffers was a great success in that project, since we could share schemas between IoT devices and the backend, and were able to generate a lot of code which would otherwise have been hand-written.
> Using something like flatbuffers would have made a lot more sense.
It might be able to solve the same problem. But it also needs to answer the questions: Is a suitable library available for all consumers of the data-structure? I don't think this was the case for our use-case, so it wouldn't have made more sense to use it back then.
> my argument is that the copying nature of protobuf can save memory bandwidth.
Huh? An extra pass over the data to parse it obviously uses more memory bandwidth.
I might understand your argument if parsing converted the data into a memory-bandwidth-optimized format, but the protobuf parsed form is certainly not that, unless things have changed very drastically since I worked on it.
Right but the parsed form is exactly what I meant when I said it’s an aspect of the implementation. The generated C++ code you get from Google’s protoc is a sparse thing, to be sure. But that is an artifact of the implementation. You can do anything you want with a protobuf and there are numerous independent implementations in the wild, in many languages.
I believe the point is that the actual wire format of protobuf is not amenable to memory mapping and direct manipulation or access. So to use it this way a copy would have to be made. The appeal of cap'n'proto and flatbuffers is the ability to map and work with the serialized format with minimal overhead.
At Google scale protobuf works perfectly fine. It's our lingua franca and a lot of work (apparently by yourself included) has gone into making it performant. But it comes with normative lifestyle assumptions.
Sure you could change the wire format and implementation to be mappable; but then it wouldn't be compatible with the mainstream implementation.
Couldn't find your ldap internally.. would be happy to chat on what you tried to make flatbuffers go faster. Mine you can find on the go link page for flatbuffers.
FlatBuffers can be accessed instantly without deserialization or allocation, so clearly in some cases a huge speedup is possible. If in your case there was no speedup, there must be other bottlenecks.
> there's no intrinsic value to your stronger definition of zero-copy
> A direct mapping to memory is not always optimal for performance.
You comment seems to imply that the first statement follows from the second, but it does not. You're right, a direct memory mapping might not be optimal in some situations, but then again in some others it might be optimal. So this feature isn't always useful to everyone, but it doesn't follow that it's never useful to anyone.
Even if you worked on this on a range of projects at Google, isn't it possible that there are people working on systems that have rather different performance characteristics than Google's systems?
> Aside from the mmap() call, you can do all this without even allocating any memory at all.
I think this is the important point when it comes to discussing zero-copy. I've written a custom protobuf implementation for java which can do exactly that.
It's a bit tricky since protobuf supports recursive messages and java's Unsafe is not as powerful as what you can have in C++. My trade-off was to require the caller to pre-allocate messages needed before parsing the data. This works great when working with multi-gigabyte files where you want to process a large number of (possibly nested) messages, but is not as ergonomic as normal protobuf code.
It obviously doesn't come for free, as you need to do a linear scan to find those tag-values, but there are ways to speed that up too, so it becomes very fast in practice.
I'm sure Cap'n Proto and FlatBuffers are faster for some use-cases (I haven't tested), but a very important point for me is to be wire-compatible with protobuf3 and its ecosystem... and still be zero-copy/zero-alloc.
Author of protobluff [1] here - a zero-copy, mostly stack-based implementation for Protocol Buffers in C. Yes, you must scan through the message to find the corresponding tag/value pair you're interested in, but in many cases, it can be fast enough if you're only interested in a few values of a large message. Sure, using a vtable is much faster, as it provides O(1) lookup semantics, but sometimes you may not be able to change to another format without touching the entire stack.
Sounds like there's a point to be made for referential integrity too. That is, if a struct contains string A twice, when you read it back in you'd want both those pointers to be identical. You'd get this for free with Cap'n Proto, but it would require extra care with "one-copy" or looser definitions of "zero-copy."
Heh, well, you could get it for free in Cap'n Proto if Cap'n Proto allowed pointer aliasing. It doesn't, though, because if it did, then messages would not be trees, they'd be graphs, which ruins a lot of stuff. For example, a very common thing to do with a message is copy one branch of the tree into a different message. Deep-copying a branch of a tree is easy. Deep-copying a branch of a graph, though -- what does that even mean?
For a DAG, maybe, but it requires a lot more bookkeeping. Now you have to remember all the pointers you've seen before in order to detect dupes. To do that you probably need a hash map and some dynamic memory allocation, ugh.
And what happens if you copy two different branches of one message into another, and they happen to share some children? Do you have to keep your seen-pointer map around across multiple copies?
For a cyclic graph, things get more confusing. Copying one branch of a fully-connected cyclic graph always means copying the entire graph. Apps can easily get into trouble here. Imagine an app that implements its own tree structure where nodes have "parent" pointers. If they try to copy one branch into another message, they accidentally copy the entire tree (via the parent pointers) and might not even realize it.
The one way that I think pointer aliasing could be made to work is if pointers that are allowed to alias are specially-marked, and are non-owning. So each object still has exactly one parent object, but might have some other pointers pointing to it from elsewhere. A copy would not recurse into these pointers; it would only update them if the target object happened to be part of the copy, otherwise they would have to become null.
But I haven't yet had any reason to try implementing this approach. And apps can get by reasonably well without it, by using integer indexes into a table.
This isn't a question of finding a magical solution, it's a question of trade-offs in performance, complexity, and usability that any solution necessarily imposes, and whether those trade-offs are worth it to support a feature that 99% of use cases don't need.
"Skyscrapers have been solved since the 20's" doesn't answer whether I should use steel beams when building a house.
Worth noting that by the weak definition of zero-copy JSON can also have zero-copy deserialization; many JSON embedded libraries either write a null terminator in the original buffer or return a pointer and a length.
Could you point to documentation on how does Cap'n Proto achieve this? Does it keep a header with offsets of bye positions for individual fields? What happens when a variable sized field is edited?
Records in Cap'n Proto are laid out like C structs. All fields are at fixed offsets from the start of the structure. For variable-width values, the struct contains a pointer to data elsewhere in the message.
Each new object is added to the end of the message, so that the message stays contiguous. This does imply that if you resize a variable-width object, then it may have to be moved to the end of the message, and the old space it occupied becomes a hole full of zeros that can't really be reused. This is definitely a down-side of this approach: Cap'n Proto does not work great for data structures that are modified over time. It's best for write-once messages. FlatBuffers has similar limitations, IIRC.
I think all my major projects are in my profile... Cloudflare Workers (and Cap'n Proto, which it uses) are my day job; I don't get much time to work on other projects these days unfortunately.
Indeed, there were several projects that can replace the generator (really, you can come up with something completely else). One of the latest I've seen is https://perfetto.dev/docs/design-docs/protozero
I'm building Concise Encoding, which is schema-less, with a 1:1 compatible text and binary representation. The binary format supports zero copy for string and binary values.
The reference implementation (in go) is 90% complete, enough to marshal objects except for recursive support.
but i really do wonder why this is such a matter of debate. if performance and specific semantics are an issue...just use the standard tricks and write bytes into a buffer and push it onto the wire.
if performance isn't an issue, then just use any of these tools. unless the tooling cost and representational issues make it easier to to just use bytes.
all for abstractions..but people seem to to be blind to the idea that there's a perfectly good one a short step down from capn proto
No, not in the way that Cap'n Proto and FlatBuffers are.
Protobuf can support zero-copy of strings and byte arrays embedded in the message. Cap'n Proto and FlatBuffers support zero-copy of the entire message structure.
(Disclosure: I'm the author of Cap'n Proto and Protobuf v2.)
> Protobuf can support zero-copy of strings and byte arrays embedded in the message.
Just to be clear, even this more limited notion of "zero copy" isn't currently supported in the open-sourced version of protobuf. It is supported in the internal Google version, which is presumably what you're thinking of. There is an open issue [1] tracking the possibility of making it available in the open source version too.
This HN article, though, is about FlexBuffers. FlexBuffers appears to be based on FlatBuffers, but does not use schemas. Cap'n Proto, FlatBuffers, and Protobuf are all schema-driven (you must define your message types in a special language upfront). FlexBuffers is more like JSON in that all types are dynamic.
Personally I'm a strong believer that schemas are highly desirable, but some people argue that schema-less serializations let you get stuff done faster. I think it's very analogous to the argument between type-safe languages vs. dynamically-typed languages. Obviously there are a lot of smart people on both sides of these arguments.
One use case where schema-less is the way to go, when you provide the infrastructure, but have no „ownership“ of data it will be used for. E.g. you build a logging or analytics tool where customers can send arbitrary data. Or a document database as a matter of fact. There schema-less / self described data is a must.
Not necessarily. For logging/analytics, you could have customers upload their schema when configuring the service. I would think that doing so would allow for some powerful optimization opportunities, enabling your service to save quite a bit of CPU and maybe some bandwidth, too. It would probably also allow you to provide a better user experience, like making it easier to construct dashboards and such because you actually know how the data is structured.
For a document database, I don't agree at all. Some time back I spent more time than I'd like developing on Mongo, and boy did I wish I could actually tell it the schema of documents in each collection and have it enforce that (not to mention optimize based on it). A lot of developers actually use libraries on top of Mongo to define and enforce schemas.
True, I guess it all boils down to ease of use (convenience). You can build a system which accepts schema + data and build dash boards and data relevant relevant processing + optimisations, but that results in a much more complex system with higher entry burden. Sadly convenient systems always get broader adoption.
Yeah, I see this as pretty similar to the debate over type-safe vs. dynamic languages. It used to be that people argued that dynamic languages were just way easier to use, and type-safe languages were just an optimization. But I think the real issue was that the tooling enabled by type-safe languages didn't exist yet. These days, TypeScript is no faster than JavaScript but is very popular, because of the tooling it enables, like editors with auto-complete and jump-to-definition.
I think protocols still don't have the level of tooling that makes it really obvious why schemas are better.
Flatbuffers can support some writeable updates and it's language support is pretty comprehensive.
> Personally I'm a strong believer that schemas are highly desirable
I'm currently working on a project that's very executable size sensitive, and having a fixed schema is crucial to that, as can optimize a lot more aggressively. We can even use dead code feedback to identify schema parts that aren't consumed on the client
Thanks for linking the article. Despite the original article being about FlexBuffers, I happen to have been looking at FlatBuffers vs Cap'n Proto today.
That article is a bit old; is there anything that stands out to you in the last ~5 years where things have diverged?
I haven't kept track of FlatBuffers so I don't know what might have changed there, except that I imagine they support a lot more languages now (probably more than Cap'n Proto honestly). Cap'n Proto's serialization layer hasn't changed very much in those 5 years; development focus has been more on the RPC system.
I suppose you could layer something on top. But I think it would be tricky to come up with something with satisfying performance properties. A naive encoding of JSON into Cap'n Proto would result in messages that are much larger than JSON messages, because of all the pointers and padding and text field names.
I haven't looked into exactly how FlexBuffers work but off the top of my head I suspect that leveraging FlatBuffer's "virtual table" technique probably helps here. In (normal, schema-ful) Cap'n Proto, fields within a struct have fixed offset, meaning that unused fields still take space. As I understand it, FlatBuffers tries to avoid this by adding an extra layer of indirection -- each struct has a sort of "virtual table" which stores the offsets of each field, where some fields might not be present at all. If multiple structs in a message happen to end up with the same virtual table, then the virtual table is only written once.
Totally speculating here since, again, I haven't actually looked at FlexBuffers, but if I were building something isomorphic to JSON on top of FlatBuffers, I'd probably look into extending the virtual tables to index fields by name rather than number. So if you have two structures with the same set of field names, they can share a virtual table, and those field names only have to appear once. That'd be a pretty great way to compress JSON.
Back in Cap'n Proto, we don't have these vtables. For data with fixed schemas, my opinion is that these vtables seem like they require more bookkeeping than they are worth. But for dynamic schemas they seem like a much bigger win. So if you wanted to encode dynamic schemas layered on top of Cap'n Proto, you'd probably have to come up with some similar vtable thing yourself.
FlexBuffers are actually not built on top of the FlatBuffers encoding, they have their own special purpose encoding, which tries to be as compact as possible while still allowing in-place access (details, search for FlexBuffers here: https://google.github.io/flatbuffers/flatbuffers_internals.h...).
Funny you should say vtables may not be worth it.. I was of a similar opinion (why would you have many fields that are not in use??) until people showed me some of the Protobuf schemas in use at Google, with hundreds of fields, most unused. This is what pushed me in the direction of the vtable design.
Data always starts out neatly.. but the longer things live, the more this kind of flexibility pays off.
> some of the Protobuf schemas in use at Google, with hundreds of fields, most unused
Yeah, I was quite familiar with those back when I was at Google. But, IIRC, a lot of them were semantically unions, but weren't actually marked as such mostly because "oneof" was introduced relatively late in Protobuf's lifetime.
Agreed, most use cases are not this extreme. But I saw them as an "upper bound" on how people would stretch a serialization system. I didn't want to be the guy saying "640k should be enough for everyone".
No joke, I originally was arguing for 8-bit vtable entries because surely no-one ever needs more than 256 bytes worth of fields. Good thing my co-workers were smarter than me.
And yes, FlatBuffers has built-in unions from day 1, which was probably helpful.
Schema-ful, copying: Protobuf, Thrift, plenty more
Schema-ful, zero-copy: Cap'n'proto, Flatbuffers
Schema-less, copying: Json (binary and other variants included), XML
Schema-less, zero-copy: Flexbuffers (Any others? This seems new to me)