Could you share some more about that very optimized binary protocol? I know there are ways to be more efficient than JSON but since you call it crappy, your solution must be much much better. Honestly interested to readup more.
It is not "our" protocol, it is protocol designed by exchange and we need to support it, as we can not change it :).
Simple binary messages, with binary encoded numbers, etc. No string parsing, no syntax, nothing like this, only bytes and offsets. Think about TCP header, for example.
JSON is very inefficient both in bytes (32 bit price is 4 bytes in binary and could be 7+ bytes as string, think "1299.99" for example) and CPU: to parse "1299.99" you need burn a lot of cycles, and if it is number of cents stored as native 4-byte number you need 3 shifts and 4 binary ors at most, if you need to change endianness, and in most cases it is simple memory copy of 4 bytes, 1-2 CPU cycle.
When you have binary protocol, you could skip fields which you are not interested in as simple as "offset = offset + <filed-size>" (where <filed-size> is compile-time constant!) and in JSON you need to parse whole thing anyway.
Difference between converting binary packet to internal data structure and parsing JSON with same data to same structure could be ten-fold easily, and you need to be very creative to parse JSON without additional memory allocations (it is possible, but code becomes very dirty and fragile), and memory allocation and/or deallocation costs a lot, both in GC languages and languages with manual memory management.
Curious if the binary protocol uses floating point or a fixed point representation? Or is floating point with its rounding issues sufficient for the protocol's needs?
No GP but familiar with these protocols.
They use fixed point extensively; I can't even thing of an exchange protocol which would use floating point since the rounding issues would cause unnecessary and expensive problems.
Prices are integer fields. When converted to a decimal format, prices are in fixed point format with 6 whole number places followed by 4 decimal digits. The maximum price in OUCH 4.2 is $199,999.9900 (decimal, 7735939C hex). When entering market orders for a cross, use the special price of $214,748.3647 (decimal, 7FFFFFFF hex).
For NASDAQ it seems to have been something around 430k / share... Buffett's BRK shares threatened to hit that limit a couple months ago: https://news.ycombinator.com/item?id=27044044
Most of them use decimal fixed point. Sometimes exponent (decimal, not binary one!) is fixed per-protocol, sometimes per-instrument and sometimes per-message, it depends on exchange.
If you're optimizing for latency JSON is pretty terrible, but most people who use it are optimizing for interoperability and ease of development. It works just fine for that, and you can recover decent bandwidth just by compressing it.
"old skool" exchanges uses either FIX (old and really vernose), FAST (binary encoding for FIX) or custom fixed-layout protocols.
Most big USA exchanges uses custom fixed-layout protocols, where each message is described in documentation, but not in machine-readable way. European ones still use FAST.
I didn't seen FIX in the wild for data feeds, but it is used for brokers, to submit orders to exchange (our company didn't do this part, we only consume feeds).
I don't know why, but all Crypto Exchanges use JSON, not protobufs or something like this, and didn't publish any formal schemes.
Fun fact: one crypto exchange put GZIP'ed and base64'ed JSON data into JSON which pushed to websocket, to save bandwidth. IMHO, it is peak of bad design.
And msgpack if you want an order of magnitude faster serialization/deserialisation and can put up with worse compression (I think mainly due to schema overhead since protobuf files don't store the schema?)
Have a look at NASDAQ ITCH, OUCH, and RASH
(these are the real names, the story I heard is the original author of them didn't like the usual corporate style brand names and wanted certain people to squirm when talking about them).