Serializing Data - JSON vs. Protocol Buffers

andres · on Aug 17, 2009

Google's documentation claims that protocol buffers are designed to be fast so I tested them out and I also found that their Python implementation was too slow to use. I posted a question to the Discussion Forum and found out that it's a known (but undocumented!!!) problem:

http://groups.google.com/group/protobuf/browse_thread/thread...

Recently I tried out keyczar, Google's crypto toolkit and found that their Python implementation was too slow by a factor of 100X because they are using a slow random string generator:

http://groups.google.com/group/keyczar-discuss/browse_thread...

It's hard to see how both of these problems could have passed internal benchmarking and made it into live code. Maybe Google isn't using its own Python implementations internally.

lacker · on Aug 18, 2009

The internal standard for when speed of protocol buffer operations matters is to use the C++ protocol buffer code but use swig to wrap it for python. Unfortunately, this is really easy internally due to protocol buffer integration with our build system, but not easy externally since most people don't have a build phase for their python projects.

jasonwatkinspdx · on Aug 18, 2009

Large systems tend to push you towards tradeoffs other people wouldn't make. Scalability over efficiency. Bisection bandwidth is the most scarce resource in a large cluster so I imagine protocol buffers have had much more attention paid to saving bytes than to encoding/decoding speed. Latency critical stuff probably doesn't use python anyhow.

haberman · on Aug 18, 2009

Protobufs are actually pretty fast to encode and decode (in the neighborhood of 200-300MB/s on my core2 desktop, when using the C++ bindings).

It's just the Python implementation that is slow. I'm working on a Python implementation that will be much faster. It's really unfortunately that Protocol Buffers are getting a bad rap due to the current Python implementation.

mbrubeck · on Aug 17, 2009

My friend Josh (a Google engineer) is working on a Protocol Buffer implementation that's explicitly designed for (a) efficiency and (b) clean integration with dynamic languages like Python: http://wiki.github.com/haberman/upb

It's still a work in progress, but the code is there (including tests and some documentation) and you can start working on bindings for your favorite language...

jacquesm · on Aug 17, 2009

Conclusion, if you're developing for the Android platform you may want to use protocol buffers, otherwise use JSON.

I'm quite surprised at how large that library is and how badly it performs, especially since it does less translation than a comparable JSON serializer would.

For a complete comparison it would have been nice to include XML as well.

jpschorr · on Aug 17, 2009

It should probably be noted that the Python implementation currently works much differently than the C++ and Java implementations.

The C++ and Java versions generate code for each message type defined in a your proto file (in fact there an option that let's you specify to optimize for speed). The python version on the other hand creates an object representing the proto file, and reflects upon it for serialization/deserialization.

fauigerzigerk · on Aug 17, 2009

Conclusion, if you're developing for the Android platform you may want to use protocol buffers, otherwise use JSON.

We have four variables that might influence performance:

a) the algorithm used in the library's implementation

b) the programming language/runtime used

c) the operating system

d) whether json or pb is used

Now your conclusion implies that only c and d matter. That doesn't make sense to me.

axod · on Aug 17, 2009

>> "For a complete comparison it would have been nice to include XML as well."

Might have needed a logarithmic scale to include it ;)

stingraycharles · on Aug 17, 2009

I wonder whether the author used the 'optimize_for=SPEED' parameter in Protocol Buffers. Seems relevant, and since the author didn't mention it, I suppose he probably didn't.

I personally have a hard time accepting that PB is actually orders of magnitude slower than JSON, especially given the fact that PB prides itself in its efficiency.

metachris · on Aug 17, 2009

i didn't use the optimize_for=SPEED option but will do that right now and update it in half an hour.

EDIT: I just added the option *optimize_for=SPEED to the .proto file, and it increases the speed of the Protocol Buffers by around 5% (still 10 x more than JSON with Python).

> I personally have a hard time accepting that PB is actually orders of magnitude slower than JSON

it's not always slower, but in certain situations yes. it seems that the Python implementation is especially slow -- would be nice to see the results with C++. anyone cares to give it a shot?

james2vegas · on Aug 18, 2009

And ASN.1. Yet another Google re-invention of the wheel.

jrockway · on Aug 17, 2009

Interesting. I am writing an Android app that uses JSON. Using Protocol Buffers never even occurred to me. (JSON works fine, and I am not going to change for this app, but I will definitely consider it in the future.)