Google's documentation claims that protocol buffers are designed to be fast so I tested them out and I also found that their Python implementation was too slow to use. I posted a question to the Discussion Forum and found out that it's a known (but undocumented!!!) problem:
Recently I tried out keyczar, Google's crypto toolkit and found that their Python implementation was too slow by a factor of 100X because they are using a slow random string generator:
It's hard to see how both of these problems could have passed internal benchmarking and made it into live code. Maybe Google isn't using its own Python implementations internally.
The internal standard for when speed of protocol buffer operations matters is to use the C++ protocol buffer code but use swig to wrap it for python. Unfortunately, this is really easy internally due to protocol buffer integration with our build system, but not easy externally since most people don't have a build phase for their python projects.
Large systems tend to push you towards tradeoffs other people wouldn't make. Scalability over efficiency. Bisection bandwidth is the most scarce resource in a large cluster so I imagine protocol buffers have had much more attention paid to saving bytes than to encoding/decoding speed. Latency critical stuff probably doesn't use python anyhow.
Protobufs are actually pretty fast to encode and decode (in the neighborhood of 200-300MB/s on my core2 desktop, when using the C++ bindings).
It's just the Python implementation that is slow. I'm working on a Python implementation that will be much faster. It's really unfortunately that Protocol Buffers are getting a bad rap due to the current Python implementation.
My friend Josh (a Google engineer) is working on a Protocol Buffer implementation that's explicitly designed for (a) efficiency and (b) clean integration with dynamic languages like Python: http://wiki.github.com/haberman/upb
It's still a work in progress, but the code is there (including tests and some documentation) and you can start working on bindings for your favorite language...
Conclusion, if you're developing for the Android platform you may want to use protocol buffers, otherwise use JSON.
I'm quite surprised at how large that library is and how badly it performs, especially since it does less translation than a comparable JSON serializer would.
For a complete comparison it would have been nice to include XML as well.
It should probably be noted that the Python implementation currently works much differently than the C++ and Java implementations.
The C++ and Java versions generate code for each message type defined in a your proto file (in fact there an option that let's you specify to optimize for speed). The python version on the other hand creates an object representing the proto file, and reflects upon it for serialization/deserialization.
I wonder whether the author used the 'optimize_for=SPEED' parameter in Protocol Buffers. Seems relevant, and since the author didn't mention it, I suppose he probably didn't.
I personally have a hard time accepting that PB is actually orders of magnitude slower than JSON, especially given the fact that PB prides itself in its efficiency.
i didn't use the optimize_for=SPEED option but will do that right now and update it in half an hour.
EDIT: I just added the option *optimize_for=SPEED to the .proto file, and it increases the speed of the Protocol Buffers by around 5% (still 10 x more than JSON with Python).
> I personally have a hard time accepting that PB is actually orders of magnitude slower than JSON
it's not always slower, but in certain situations yes. it seems that the Python implementation is especially slow -- would be nice to see the results with C++. anyone cares to give it a shot?
Interesting. I am writing an Android app that uses JSON. Using Protocol Buffers never even occurred to me. (JSON works fine, and I am not going to change for this app, but I will definitely consider it in the future.)
http://groups.google.com/group/protobuf/browse_thread/thread...
Recently I tried out keyczar, Google's crypto toolkit and found that their Python implementation was too slow by a factor of 100X because they are using a slow random string generator:
http://groups.google.com/group/keyczar-discuss/browse_thread...
It's hard to see how both of these problems could have passed internal benchmarking and made it into live code. Maybe Google isn't using its own Python implementations internally.