Not-Faster on what platform? In Borg, in Google3 code, deployed on a fast machine with a nice fast wide memory bus and a large cache?
What about in embedded code, or in a game? A place where memory bandwidth is scarce, or where we're trying desperately to reduce the number of syscalls and jumps back and forth between kernel and user space?
Having the entire payload memory mapped, and copies avoided, makes an absolutely huge difference once these kinds of concerns are real. Having something mmap'd in theory means it's nice and hot and fresh in the kernel's mind, and potentially kept in cache.
Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship. Using something like flatbuffers would have made a lot more sense.
At Google protobuf is the veritable "once I have a hammer everything looks like a nail", to the point where I've seen protobufs in JSON payloads, or vice versa, or even deeper... protobof in JSON inside a Chromium Mojo payload... because, well, how good could it be without protobuf?
> Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship.
I shipped an embedded project using an RTOS and < 256kB RAM using protocol buffers (even nested ones) and zero-copy deserialization for byte arrays some time ago. We used nanopb, and it worked just fine if you understand how it works - although it certainly is less pleasant to work with than a fuller implementation which would just have copied out the internal bytes into new arrays and not have let us deal with lots of internal pointers into byte arrays.
Overall using Protocol Buffers was a great success in that project, since we could share schemas between IoT devices and the backend, and were able to generate a lot of code which would otherwise have been hand-written.
> Using something like flatbuffers would have made a lot more sense.
It might be able to solve the same problem. But it also needs to answer the questions: Is a suitable library available for all consumers of the data-structure? I don't think this was the case for our use-case, so it wouldn't have made more sense to use it back then.
> my argument is that the copying nature of protobuf can save memory bandwidth.
Huh? An extra pass over the data to parse it obviously uses more memory bandwidth.
I might understand your argument if parsing converted the data into a memory-bandwidth-optimized format, but the protobuf parsed form is certainly not that, unless things have changed very drastically since I worked on it.
Right but the parsed form is exactly what I meant when I said it’s an aspect of the implementation. The generated C++ code you get from Google’s protoc is a sparse thing, to be sure. But that is an artifact of the implementation. You can do anything you want with a protobuf and there are numerous independent implementations in the wild, in many languages.
I believe the point is that the actual wire format of protobuf is not amenable to memory mapping and direct manipulation or access. So to use it this way a copy would have to be made. The appeal of cap'n'proto and flatbuffers is the ability to map and work with the serialized format with minimal overhead.
At Google scale protobuf works perfectly fine. It's our lingua franca and a lot of work (apparently by yourself included) has gone into making it performant. But it comes with normative lifestyle assumptions.
Sure you could change the wire format and implementation to be mappable; but then it wouldn't be compatible with the mainstream implementation.
What about in embedded code, or in a game? A place where memory bandwidth is scarce, or where we're trying desperately to reduce the number of syscalls and jumps back and forth between kernel and user space?
Having the entire payload memory mapped, and copies avoided, makes an absolutely huge difference once these kinds of concerns are real. Having something mmap'd in theory means it's nice and hot and fresh in the kernel's mind, and potentially kept in cache.
Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship. Using something like flatbuffers would have made a lot more sense.
At Google protobuf is the veritable "once I have a hammer everything looks like a nail", to the point where I've seen protobufs in JSON payloads, or vice versa, or even deeper... protobof in JSON inside a Chromium Mojo payload... because, well, how good could it be without protobuf?