Building a high performance JSON parser

masklinn · on June 28, 2020

[in Go] seems relevant enough to be added to the title as other languages could reasonably easily bind to existing high-performance JSON parsers like sajson or simdjson but doing so from Go incurs all the usual cgo issues.

tych0 · on June 28, 2020

What are the usual cgo issues? I've used it a fair bit and haven't had any general problems.

There are specific problems like you can't use syscalls which must be single threaded (a royal pain for container managers, since setns() is), but binding to a library like simdjson wouldn't have this problem.

masklinn · on June 28, 2020

> What are the usual cgo issues? I've used it a fair bit and haven't had any general problems.

1. cgo calls have much higher overhead than regular function calls, so for small documents you'd likely lose performance rather than gain, and even for large ones depending how you read from the parser it might also be terrible, callback-based libraries are even worse as calling Go from C is even slower

2. concurrency can suffer a lot as a Ccall will prevent switching out the corresponding goroutine, locking out a scheduler thread

3. cgo complicates builds, especially cross-compilation

4. it also makes deployments more complicated if you were relying on just synching a statically linked binary

5. might have improved since, but used to be most of the built-in Go development tools couldn't cross the cgo barrier, and non-go devtools generally don't support go

sk0g · on June 28, 2020

There's a latency overhead to cgo calls that isn't really on the list of things to be optimised. Renders the language unusable for wrapping around OpenGL and the likes, in a (complex) gaming context.

See issues like this: https://github.com/golang/go/issues/19574

sbr464 · on June 28, 2020

In the main scanner function, a few minor performance squeezing notes I'd like to test:

1. Moving "length := 0" above the for loop, since it's reassigned in all the needed cases.

2. To avoid having an extra "if whitespace[c]", Including the whitespace cases in the main switch statement, even if it means duplicating or moving "s.br.release"?

Or, using a switch statement vs a lookup ("whitespace[c]"), if it must be done.

3. In the switch statement, using multiple assignment (in most cases):

  length = validateToken(&s.br, "false")
  s.pos = length
  
  //
  s.pos = length = validateToken(&s.br, "false")

4. In the String and default cases, inlining the length assignment within the if statement.

5. Returning "s.br.window()[:length]" in each case vs breaking out of the switch statement to return. Even though it's ugly, to avoid one step.

6. I'm curious if any performance could be gained by including more cases for common characters (A-Z,a-z, 0-9), to avoid using the default case. Testing if there is a penalty for using a default case vs more cases, even if it's ugly.

7. Including additional cases for exact values to avoid extra function calls to "parseString(&s.br)" or "s.parseNumber()".

8. I'm curious in some cases, if peeking at the next character with a nested switch statement, could avoid additional iterations or function calls to validate/release.

9. In the whitespace check, peeking for common JSON formatting patterns to avoid iterations. Such as 2 or 4 spaced json, a new line, followed by tabs or spaces etc. Or possibly establishing that the JSON is "probably2Spaced/probably4Spaced" and then peeking more efficiently?

e12e · on June 28, 2020

> 7. Including additional cases for exact values to avoid extra function calls to "parseString(&s.br)" or "s.parseNumber()".

I can see how you might choose some numbers to optimize for (1..10 for example) - but strings? You could of course do a frequency analysis of the test data - but would that help in general, beyond just cheating on the benchmark?

I guess you could try for "key" and "value", and maybe "id"? Possibly adding "email" and "name"?

Also from tfa regarding numbers:

> Scanner.parseNumber is slow because it visits its input twice; once at the scanner and a second time when it is converted to a float. I did an experiment and the first parse can be faster if we just look to find the termination of the number without validation, canada.json went from 650mb/s to 820mb/sec.

dzsekijo · on June 28, 2020

Adding this, for sake of completeness (I have no experience with it, just yet another Go JSON project aiming at high performance).

https://github.com/minio/simdjson-go

glangdale · on June 28, 2020

I wrote the original simdjson code along with Daniel Lemire. The go version of simdjson (it's a rewrite, not just a binding to C++ code) is slower than the original simdjson but still 8-15x faster than encoding/json.

I don't know how the two compare as I don't really know where the overheads happen in the go version. Assuming the analogous case is "decoding into an interface{}" the simdjson port would be considerably faster.

throwaway894345 · on June 28, 2020

According to the article, encoding/json is particularly slow with respect to decoding due to allocations. Notably, the API makes it difficult (impossible?) to avoid these allocations. Do you know if simdjson is significantly faster in this regard? And if you don't know that, do you know if the decoding API is the same as with encoding/json?

glangdale · on June 28, 2020

I don't know; this isn't a huge source of performance problems in the C++ version and I don't know much about how the go version works.

vlowther · on June 28, 2020

There is also https://github.com/segmentio/encoding, which aims to have a zero allocation fastpath while preserving compatibility with encoding/json.

Gys · on June 28, 2020

From the OP: 'I believed that I could implement an efficient JSON parser based on my assumption that encoding/json was slower than it could be because of its API. It turned out that I was right, it looks like there is 2-3x performance in some unmarshalling paths and between 8x and 10x performance in tokenisation, if you’re prepared to accept a different API.'

zamalek · on June 28, 2020

The cache line on x86 is 64 bytes. Your whitespace lookup table is way too big. At the very least, you can subtract (with overflow) '\t' and check that the character is not greater than ' ' before hitting the LUT.

The ASCII table is ripe for bit twiddling (I suspect it was organized according with that in mind). You may find bit patterns in whitespace chars.

peterohler · on June 28, 2020

Interesting approach. A different approach was taken with OjG (https://github.com/ohler55/ojg) which shows a significant performance improvement over the current golang JSON parser.

todotask · on June 28, 2020

Wasn't aware you have developed ojg, how did it came about iirc I have asked about Go version a few years ago.

peterohler · on June 28, 2020

Just finished it last week. I'm pretty happy with the results but even more pleased with the JSONPath. It even works on regular types using reflection.

ypcx · on June 28, 2020

I've tried to parse those test files in Deno (in a very rudimentary test[1]), the results (on my i9-9980HK @ 2.40GHz) are:

  canada.json          -->   31 ms,   73 MB/s
  citm_catalog.json    -->   13 ms,  135 MB/s
  code.json            -->   17 ms,  113 MB/s
  example.json         -->    0 ms,   73 MB/s
  sample.json          -->    6 ms,  124 MB/s
  twitter.json         -->    6 ms,  114 MB/s

[1] https://gist.github.com/youurayy/18553475c5a9f81a17345cddeeb...

tptacek · on June 28, 2020

Is that code unmarshalling to types?

ypcx · on June 28, 2020

Deno is a JavaScript runtime built in Rust, so I guess it depends what you mean by "types". Definitely not a strong typing, although the example could be rewritten to TypeScript, which Deno supports natively. I use Deno sometimes, so it was of interest to me to compare it to the advancements presented here for Go.

tptacek · on June 28, 2020

I like Deno (v8 is written in C++, not Rust, for what it's worth); the question is just: is it doing the same work that the encoding/json benchmark is doing? What are we comparing?

Matthias247 · on June 29, 2020

You can see in the linked code that it just calls `const parsed = readJsonSync(fileName)`, which likely returns an untyped struct. So this purely does parsing into an anonymous structure, and not validation and mapping into something strongly typed

otabdeveloper4 · on June 28, 2020

All this effort on building a better JSON parser for language 'X' should have been better spent on inventing a statically-typed JSON. (Type inference seems like a thing that should work well in this space.)

jcelerier · on June 28, 2020

> All this effort on building a better JSON parser for language 'X' should have been better spent on inventing a statically-typed JSON

how does that help when you want to implement a JSON-based protocol ?

otabdeveloper4 · on June 28, 2020

I'd guess that 99.9% of those protocols are actually statically-typed.

E.g., an object's "foobar" field is always an array of int, and won't suddenly become a string from one invocation to the next.

It seems incredibly strange that we're somehow not leveraging this.

fnord123 · on June 28, 2020

What you're saying exists. Protobuf, flatbuffers, capnproto, xdr, parquet, arrow, avro, orc, sqlite. Indeed if you use these formats you will be able to load the data into memory more quickly than using JSON.

But this article isn't about those protocols. It's about JSON.

chrismorgan · on June 28, 2020

You can do it with JSON too. Various statically-typed languages do JSON parsing straight into structs all the time. (e.g. in the Rust ecosystem, Serde is the most popular implementation of this concept.)

redmorphium · on June 28, 2020

Because it's relatively easy to make JSON-serialization fast if you know the shape ahead-of-time. It's a solved problem.

Here's a pure-JS solution: https://github.com/fastify/fast-json-stringify

Semaphor · on June 28, 2020

> E.g., an object's "foobar" field is always an array of int, and won't suddenly become a string from one invocation to the next.

If only… I had to figure out how to deserialize a JSON API where a field could be: a) an object X, b) an array of X, or c) a string (mapping to X.Y)

chrismorgan · on June 28, 2020

That doesn’t need to be a problem. In Rust, the most popular thing in this space is Serde, with the serde_json crate. It can cope with such things just fine; define a data type that is capable of representing all the possibilities (perhaps as simple as an enum with a variant for each style, or perhaps it normalises them into one), and then… well, you probably have to write a visitor to help deserialise, which isn’t trivial, especially when you’re used to just deriving those things and so you’re not so familiar with Serde internals. But it’s possible. https://serde.rs/string-or-struct.html gives one example. The length of the example may be daunting, but it’s actually not too bad.

Semaphor · on June 29, 2020

It was C#. I ended up writing a custom converter that converted all 3 cases into X[]

jcrites · on June 29, 2020

Hasn't that been done? Efforts like Amazon's Ion come to mind, or Google's Protocol Buffers & FlatBuffers.

http://amzn.github.io/ion-docs/

https://developers.google.com/protocol-buffers

https://google.github.io/flatbuffers/

ProtoBuf requires a schema for the document to be known ahead of time and involves deserialization; Ion is more like JSON in that it's self-describing, but has a rich set of data types as well as annotations on fields to further describe meaning. FlatBuffers requires a schema and provides access to the without parsing/unpacking.

On the other hand, if you're building a service and specifying its interface, rather than just specifying a data format, then there are tools like Smithy, gRPC, and Thrift.

https://github.com/awslabs/smithy

https://grpc.io/

https://thrift.apache.org/

masklinn · on June 28, 2020

> a statically-typed JSON

Do you mean "JSON with a schema" (there are multiple such) or "JSON decoding to static types" (without reflection).

otabdeveloper4 · on June 28, 2020

The latter.

JSON is popular because it's schemaless; storing and versioning schemas is its own nightmare of inconsistent and inefficient hacks.

However, in practice, JSON messages almost always have an implied schema. Figuring it out on the fly shouldn't be too hard - just apply the JIT and type inference technologies that compilers have already used for decades.

(Not a problem with an immediately obvious solution, but that's why we're here, no?)

setr · on June 28, 2020

> However, in practice, JSON messages almost always have an implied schema.

Typed json already exists in various attempts eg tjson

Really I think the problem is that JSON is great when your schema is still in flux -- once it's public/stable, it makes sense to want to strictly encode the current schema.

The solution then is that you really want a way to trivially transition from "implied schema" to "strict schema" -- though a typed json won't help you too much because the larger trouble is figuring out what your schema is in practice. I guess you really want either a code analysis tool to determine it, or more likely an automated tool to take eg a list of json responses (probably based on a test suite/code coverage) and produce a typed json schema that you can simply drop into your public API

alexzender · on June 28, 2020

True, miss it also sometimes. It will not be as lightweight though - reminds me about SOAP/WSDL times.

GraphQL goes in this direction by enforcing type checks, but is designed for web frontends...

Xophmeister · on June 28, 2020

Given that almost all JSON applications are going to be IO bound (reading from disk, or more likely, network) then what’s the fascination in making super-fast JSON decoders, beyond the engineering challenge? Sure, it’s an overhead to the processing you’re actually interested in, but I hardly imagine that it’s in “straw that breaks the camel’s back” territory.

jandrewrogers · on June 28, 2020

The assertion that JSON parsing is faster than I/O for most applications does not match my experience. First, achieving 10 GB/s consistent throughput for I/O on a server is pretty ordinary these days. If your systems are designed for performance and efficiency, parsing JSON with throughput that will saturate the I/O capacity is a challenge. Second, a lot of JSON that gets parsed in applications isn’t transiting I/O at all, it is in memory and only a subset of what is parsed may transit I/O. In these cases, there is an even higher bandwidth threshold for peak performance — memory bandwidth.

I’ve profiled many applications that were completely and thoroughly bottlenecked on JSON parsing, even when care was taken to not be profligate with JSON parsing because everyone knows it is quite slow.

maccard · on June 28, 2020

Just because something isnt the slowest part of your pipeline doesn't mean it's not worth optimising.

It might be the slowest part of the pipeline you control, or your network might be a local network where your throughput is 10G. Modern SSDs can do 3+GB/second. Even if you're aiming at 1GB/second, your json parser has to be able to do 0.1ms/MB to be able to keep with even a normal disk. If JSON parsing is fast enough, you can use it as an interchange format high performance tasks, and if it's your input and output format, you can use existing tools for non performance critical tasks that use json already!

jonstewart · on June 28, 2020

Because it’s not true that JSON parsing is I/O bound. It might be if you have an old 5400 RPM laptop hard drive, otherwise it’s CPU bound. There are many, many benchmarks which will indicate this to you, including ones in TFA.

throwaway894345 · on June 28, 2020

Don't benchmarks like these usually pre-warm the file cache? If so, then these benchmarks wouldn't be evidence that parsing is I/O bound, since they're reading from kernel memory, right?

jonstewart · on June 28, 2020

Correct, good parsing benchmarks will ensure the input is in memory or, at least, heavily cached with a linear access pattern. And, the point is, those benchmarks often show libraries parse JSON at a good deal less than 100 MB/s, or, for the fast ones, maybe 100-400 MB/s. Those parsing rates are not fast enough to claim that JSON parsing is I/O bound.

Of course, the original comment is saying that most “apps” are I/O bound anyway (shall we assume web apps?). I think this is a lazy argument, or at least an ignorant/self-centered one — plenty of apps are not web apps running in an embarrassingly slow context like Django or Rails. For example, I work in digital forensics/cyber security, and we have to scan through TBs of logs (sometimes in JSON).

smarterclayton · on June 28, 2020

A concrete example here is that almost everything in the Kubernetes ecosystem (the control plane, operators, integrations, etc) is CPU bound by time spent Marshalling/Unmarshalling JSON or protobuf. So in some domains it’s all that moves the needle (in Kube I personally spent several years in protobuf, alternate JSON impls, fine tuning the hot path). Some problems are marshalling bound, although it’s not all of them.

stephc_int13 · on June 28, 2020

It makes me sad every time I see a rewrite from scratch of a working, optimized and well tested library for the sake of writing it in a newer/trendy language...

In my opinion this is much worse than reinventing the wheel.

Especially when the result is, in the end, slower.

enriquto · on June 28, 2020

> In my opinion this is much worse than reinventing the wheel.

You say it as if reinventing the wheel was a bad thing, but it is not. Technology advances by the continued reinvention of the wheel.

yomly · on June 28, 2020

Once upon a time, C and scheme were trendy languages.

Since then C has spawned many algol derivatives and the likes of Ruby and Python drew heavy inspiration from Scheme.

I'm not sure when we'll be done or what constitutes "done" but a little churn feels reasonable as a side cost to finding innovation.

jiofih · on June 28, 2020

Replying to the wrong article? This is neither a rewrite nor slower than the original.

mkcg · on June 28, 2020

You missed the "streaming" part.

Being able to allocate only a few KB to ingest any JSON is a killer feature.

There is a valid reason to reinvent the wheel : in my case I had to do something similar in C99 for a SaaS federated search engine to have the lowest memory footprint possible.