[in Go] seems relevant enough to be added to the title as other languages could reasonably easily bind to existing high-performance JSON parsers like sajson or simdjson but doing so from Go incurs all the usual cgo issues.
What are the usual cgo issues? I've used it a fair bit and haven't had any general problems.
There are specific problems like you can't use syscalls which must be single threaded (a royal pain for container managers, since setns() is), but binding to a library like simdjson wouldn't have this problem.
> What are the usual cgo issues? I've used it a fair bit and haven't had any general problems.
1. cgo calls have much higher overhead than regular function calls, so for small documents you'd likely lose performance rather than gain, and even for large ones depending how you read from the parser it might also be terrible, callback-based libraries are even worse as calling Go from C is even slower
2. concurrency can suffer a lot as a Ccall will prevent switching out the corresponding goroutine, locking out a scheduler thread
3. cgo complicates builds, especially cross-compilation
4. it also makes deployments more complicated if you were relying on just synching a statically linked binary
5. might have improved since, but used to be most of the built-in Go development tools couldn't cross the cgo barrier, and non-go devtools generally don't support go
There's a latency overhead to cgo calls that isn't really on the list of things to be optimised. Renders the language unusable for wrapping around OpenGL and the likes, in a (complex) gaming context.
In the main scanner function, a few minor performance squeezing notes I'd like to test:
1. Moving "length := 0" above the for loop, since it's reassigned in all the needed cases.
2. To avoid having an extra "if whitespace[c]", Including the whitespace cases in the main switch statement, even if it means duplicating or moving "s.br.release"?
Or, using a switch statement vs a lookup ("whitespace[c]"), if it must be done.
3. In the switch statement, using multiple assignment (in most cases):
4. In the String and default cases, inlining the length assignment within the if statement.
5. Returning "s.br.window()[:length]" in each case vs breaking out of the switch statement to return. Even though it's ugly, to avoid one step.
6. I'm curious if any performance could be gained by including more cases for common characters (A-Z,a-z, 0-9), to avoid using the default case. Testing if there is a penalty for using a default case vs more cases, even if it's ugly.
7. Including additional cases for exact values to avoid extra function calls to "parseString(&s.br)" or "s.parseNumber()".
8. I'm curious in some cases, if peeking at the next character with a nested switch statement, could avoid additional iterations or function calls to validate/release.
9. In the whitespace check, peeking for common JSON formatting patterns to avoid iterations. Such as 2 or 4 spaced json, a new line, followed by tabs or spaces etc. Or possibly establishing that the JSON is "probably2Spaced/probably4Spaced" and then peeking more efficiently?
> 7. Including additional cases for exact values to avoid extra function calls to "parseString(&s.br)" or "s.parseNumber()".
I can see how you might choose some numbers to optimize for (1..10 for example) - but strings? You could of course do a frequency analysis of the test data - but would that help in general, beyond just cheating on the benchmark?
I guess you could try for "key" and "value", and maybe "id"? Possibly adding "email" and "name"?
Also from tfa regarding numbers:
> Scanner.parseNumber is slow because it visits its input twice; once at the scanner and a second time when it is converted to a float. I did an experiment and the first parse can be faster if we just look to find the termination of the number without validation, canada.json went from 650mb/s to 820mb/sec.
I wrote the original simdjson code along with Daniel Lemire. The go version of simdjson (it's a rewrite, not just a binding to C++ code) is slower than the original simdjson but still 8-15x faster than encoding/json.
I don't know how the two compare as I don't really know where the overheads happen in the go version. Assuming the analogous case is "decoding into an interface{}" the simdjson port would be considerably faster.
According to the article, encoding/json is particularly slow with respect to decoding due to allocations. Notably, the API makes it difficult (impossible?) to avoid these allocations. Do you know if simdjson is significantly faster in this regard? And if you don't know that, do you know if the decoding API is the same as with encoding/json?
From the OP: 'I believed that I could implement an efficient JSON parser based on my assumption that encoding/json was slower than it could be because of its API. It turned out that I was right, it looks like there is 2-3x performance in some unmarshalling paths and between 8x and 10x performance in tokenisation, if you’re prepared to accept a different API.'
The cache line on x86 is 64 bytes. Your whitespace lookup table is way too big. At the very least, you can subtract (with overflow) '\t' and check that the character is not greater than ' ' before hitting the LUT.
The ASCII table is ripe for bit twiddling (I suspect it was organized according with that in mind). You may find bit patterns in whitespace chars.
Interesting approach. A different approach was taken with OjG (https://github.com/ohler55/ojg) which shows a significant performance improvement over the current golang JSON parser.
Just finished it last week. I'm pretty happy with the results but even more pleased with the JSONPath. It even works on regular types using reflection.
Deno is a JavaScript runtime built in Rust, so I guess it depends what you mean by "types". Definitely not a strong typing, although the example could be rewritten to TypeScript, which Deno supports natively.
I use Deno sometimes, so it was of interest to me to compare it to the advancements presented here for Go.
I like Deno (v8 is written in C++, not Rust, for what it's worth); the question is just: is it doing the same work that the encoding/json benchmark is doing? What are we comparing?
You can see in the linked code that it just calls `const parsed = readJsonSync(fileName)`, which likely returns an untyped struct. So this purely does parsing into an anonymous structure, and not validation and mapping into something strongly typed
All this effort on building a better JSON parser for language 'X' should have been better spent on inventing a statically-typed JSON. (Type inference seems like a thing that should work well in this space.)
What you're saying exists. Protobuf, flatbuffers, capnproto, xdr, parquet, arrow, avro, orc, sqlite. Indeed if you use these formats you will be able to load the data into memory more quickly than using JSON.
But this article isn't about those protocols. It's about JSON.
You can do it with JSON too. Various statically-typed languages do JSON parsing straight into structs all the time. (e.g. in the Rust ecosystem, Serde is the most popular implementation of this concept.)
That doesn’t need to be a problem. In Rust, the most popular thing in this space is Serde, with the serde_json crate. It can cope with such things just fine; define a data type that is capable of representing all the possibilities (perhaps as simple as an enum with a variant for each style, or perhaps it normalises them into one), and then… well, you probably have to write a visitor to help deserialise, which isn’t trivial, especially when you’re used to just deriving those things and so you’re not so familiar with Serde internals. But it’s possible. https://serde.rs/string-or-struct.html gives one example. The length of the example may be daunting, but it’s actually not too bad.
ProtoBuf requires a schema for the document to be known ahead of time and involves deserialization; Ion is more like JSON in that it's self-describing, but has a rich set of data types as well as annotations on fields to further describe meaning. FlatBuffers requires a schema and provides access to the without parsing/unpacking.
On the other hand, if you're building a service and specifying its interface, rather than just specifying a data format, then there are tools like Smithy, gRPC, and Thrift.
JSON is popular because it's schemaless; storing and versioning schemas is its own nightmare of inconsistent and inefficient hacks.
However, in practice, JSON messages almost always have an implied schema. Figuring it out on the fly shouldn't be too hard - just apply the JIT and type inference technologies that compilers have already used for decades.
(Not a problem with an immediately obvious solution, but that's why we're here, no?)
> However, in practice, JSON messages almost always have an implied schema.
Typed json already exists in various attempts eg tjson
Really I think the problem is that JSON is great when your schema is still in flux -- once it's public/stable, it makes sense to want to strictly encode the current schema.
The solution then is that you really want a way to trivially transition from "implied schema" to "strict schema" -- though a typed json won't help you too much because the larger trouble is figuring out what your schema is in practice. I guess you really want either a code analysis tool to determine it, or more likely an automated tool to take eg a list of json responses (probably based on a test suite/code coverage) and produce a typed json schema that you can simply drop into your public API
Given that almost all JSON applications are going to be IO bound (reading from disk, or more likely, network) then what’s the fascination in making super-fast JSON decoders, beyond the engineering challenge? Sure, it’s an overhead to the processing you’re actually interested in, but I hardly imagine that it’s in “straw that breaks the camel’s back” territory.
The assertion that JSON parsing is faster than I/O for most applications does not match my experience. First, achieving 10 GB/s consistent throughput for I/O on a server is pretty ordinary these days. If your systems are designed for performance and efficiency, parsing JSON with throughput that will saturate the I/O capacity is a challenge. Second, a lot of JSON that gets parsed in applications isn’t transiting I/O at all, it is in memory and only a subset of what is parsed may transit I/O. In these cases, there is an even higher bandwidth threshold for peak performance — memory bandwidth.
I’ve profiled many applications that were completely and thoroughly bottlenecked on JSON parsing, even when care was taken to not be profligate with JSON parsing because everyone knows it is quite slow.
Just because something isnt the slowest part of your pipeline doesn't mean it's not worth optimising.
It might be the slowest part of the pipeline you control, or your network might be a local network where your throughput is 10G. Modern SSDs can do 3+GB/second. Even if you're aiming at 1GB/second, your json parser has to be able to do 0.1ms/MB to be able to keep with even a normal disk. If JSON parsing is fast enough, you can use it as an interchange format high performance tasks, and if it's your input and output format, you can use existing tools for non performance critical tasks that use json already!
Because it’s not true that JSON parsing is I/O bound. It might be if you have an old 5400 RPM laptop hard drive, otherwise it’s CPU bound. There are many, many benchmarks which will indicate this to you, including ones in TFA.
Don't benchmarks like these usually pre-warm the file cache? If so, then these benchmarks wouldn't be evidence that parsing is I/O bound, since they're reading from kernel memory, right?
Correct, good parsing benchmarks will ensure the input is in memory or, at least, heavily cached with a linear access pattern. And, the point is, those benchmarks often show libraries parse JSON at a good deal less than 100 MB/s, or, for the fast ones, maybe 100-400 MB/s. Those parsing rates are not fast enough to claim that JSON parsing is I/O bound.
Of course, the original comment is saying that most “apps” are I/O bound anyway (shall we assume web apps?). I think this is a lazy argument, or at least an ignorant/self-centered one — plenty of apps are not web apps running in an embarrassingly slow context like Django or Rails. For example, I work in digital forensics/cyber security, and we have to scan through TBs of logs (sometimes in JSON).
A concrete example here is that almost everything in the Kubernetes ecosystem (the control plane, operators, integrations, etc) is CPU bound by time spent Marshalling/Unmarshalling JSON or protobuf. So in some domains it’s all that moves the needle (in Kube I personally spent several years in protobuf, alternate JSON impls, fine tuning the hot path). Some problems are marshalling bound, although it’s not all of them.
It makes me sad every time I see a rewrite from scratch of a working, optimized and well tested library for the sake of writing it in a newer/trendy language...
In my opinion this is much worse than reinventing the wheel.
Especially when the result is, in the end, slower.
Being able to allocate only a few KB to ingest any JSON is a killer feature.
There is a valid reason to reinvent the wheel : in my case I had to do something similar in C99 for a SaaS federated search engine to have the lowest memory footprint possible.