Could you share some more about that very optimized binary protocol? I know ther...

blacklion · on Sept 7, 2021

It is not "our" protocol, it is protocol designed by exchange and we need to support it, as we can not change it :). Simple binary messages, with binary encoded numbers, etc. No string parsing, no syntax, nothing like this, only bytes and offsets. Think about TCP header, for example.

JSON is very inefficient both in bytes (32 bit price is 4 bytes in binary and could be 7+ bytes as string, think "1299.99" for example) and CPU: to parse "1299.99" you need burn a lot of cycles, and if it is number of cents stored as native 4-byte number you need 3 shifts and 4 binary ors at most, if you need to change endianness, and in most cases it is simple memory copy of 4 bytes, 1-2 CPU cycle.

When you have binary protocol, you could skip fields which you are not interested in as simple as "offset = offset + <filed-size>" (where <filed-size> is compile-time constant!) and in JSON you need to parse whole thing anyway.

Difference between converting binary packet to internal data structure and parsing JSON with same data to same structure could be ten-fold easily, and you need to be very creative to parse JSON without additional memory allocations (it is possible, but code becomes very dirty and fragile), and memory allocation and/or deallocation costs a lot, both in GC languages and languages with manual memory management.

localhost · on Sept 8, 2021

Curious if the binary protocol uses floating point or a fixed point representation? Or is floating point with its rounding issues sufficient for the protocol's needs?

andylynch · on Sept 8, 2021

No GP but familiar with these protocols. They use fixed point extensively; I can't even thing of an exchange protocol which would use floating point since the rounding issues would cause unnecessary and expensive problems.

This is typical (from NASDAQ http://www.nasdaqtrader.com/content/technicalsupport/specifi... ):

Prices are integer fields. When converted to a decimal format, prices are in fixed point format with 6 whole number places followed by 4 decimal digits. The maximum price in OUCH 4.2 is $199,999.9900 (decimal, 7735939C hex). When entering market orders for a cross, use the special price of $214,748.3647 (decimal, 7FFFFFFF hex).

mschuster91 · on Sept 8, 2021

> The maximum price in OUCH 4.2 is $199,999.9900

For NASDAQ it seems to have been something around 430k / share... Buffett's BRK shares threatened to hit that limit a couple months ago: https://news.ycombinator.com/item?id=27044044

nly · on Sept 8, 2021

Many exchanges use floating point, even I the Nasdaq technology stack.

X-Stream feeds do for example

blacklion · on Sept 9, 2021

Most of them use decimal fixed point. Sometimes exponent (decimal, not binary one!) is fixed per-protocol, sometimes per-instrument and sometimes per-message, it depends on exchange.

joering2 · on Sept 8, 2021

Thanks for the writeup!

nostrademons · on Sept 7, 2021

Some Googling turned up this protocol descriptor:

https://uploads-ssl.webflow.com/5ba40927ac854d8c97bc92d7/5bf...

If you're optimizing for latency JSON is pretty terrible, but most people who use it are optimizing for interoperability and ease of development. It works just fine for that, and you can recover decent bandwidth just by compressing it.

paraph1n · on Sept 7, 2021

There are many binary encoding protocols. A popular one is protobufs[1], which is used by gRPC.

[1]: https://developers.google.com/protocol-buffers

blacklion · on Sept 7, 2021

"old skool" exchanges uses either FIX (old and really vernose), FAST (binary encoding for FIX) or custom fixed-layout protocols.

Most big USA exchanges uses custom fixed-layout protocols, where each message is described in documentation, but not in machine-readable way. European ones still use FAST.

I didn't seen FIX in the wild for data feeds, but it is used for brokers, to submit orders to exchange (our company didn't do this part, we only consume feeds).

I don't know why, but all Crypto Exchanges use JSON, not protobufs or something like this, and didn't publish any formal schemes.

Fun fact: one crypto exchange put GZIP'ed and base64'ed JSON data into JSON which pushed to websocket, to save bandwidth. IMHO, it is peak of bad design.

blibble · on Sept 7, 2021

there's still ascii FIX floating around on the market data side for a few esoteric venues

FAST is not particularly common in Europe

the large European venues use fixed-width binary encoding (LSE group, Euronext, CBOE Europe)

blacklion · on Sept 7, 2021

Eurex uses FAST for sure, but I can be wrong about "common".

gpderetta · on Sept 8, 2021

Eurex/XETRA EOBI no longer uses FAST. It is about 8 year old t this point, but IIRC some products are still on the older protocol.

nly · on Sept 8, 2021

Euronext uses SBE specifically.

rewq4321 · on Sept 7, 2021

And msgpack if you want an order of magnitude faster serialization/deserialisation and can put up with worse compression (I think mainly due to schema overhead since protobuf files don't store the schema?)

https://msgpack.org/index.html

Good protobuf vs msgpack comparison: https://medium.com/@hugovs/the-need-for-speed-experimenting-...

wffurr · on Sept 7, 2021

https://www.opraplan.com/datafeeds

andylynch · on Sept 8, 2021

Have a look at NASDAQ ITCH, OUCH, and RASH (these are the real names, the story I heard is the original author of them didn't like the usual corporate style brand names and wanted certain people to squirm when talking about them).