Hacker News new | past | comments | ask | show | jobs | submit login
Stream VByte: breaking new speed records for integer compression (lemire.me)
129 points by ingve on Sept 27, 2017 | hide | past | favorite | 11 comments



Just to give you an idea of how effective this strategy seems to be, this seems very similar to what the COST in the land of databases submission the other day was talking about, where choosing interesting, fitting algorithms for the problem instead of large distributed but general systems often yields very good results (enough such that a single thread can often beat 100+ core distributed systems).

In one of the linked prior blog posts[2] of his he linked to in the beginning of that article, he said this:

Using the Hilbert curve layout leads to a very simple compressed representation: having sorted the u64 values it uses to represent edges, one can simply delta-encode them (writing the differences between adjacent values). Because the Hilbert curve teases out locality, the gaps are often quite small, and therefore compress nicely. When I combined a variable-length integer encoding with gzip, the resulting file was only 45GB; about 2.8 bits per edge). Decompressed (with just the variable-length delta encoding), it is 154GB, which is the representation I used to get the measurements above.

He then goes on in his posts to comically trounce some large distributed systems.

1: https://news.ycombinator.com/item?id=15332051

2: https://github.com/frankmcsherry/blog/blob/master/posts/2015...


!#$!@# I was going to go try that out. ;) When doing the Chaos/Order stuff, the decompression was about half the CPU cycles, so thinning that could help.

Fwiw, the encoding I used wasn't VByte; because I was doing strict increments, I knew `0` wasn't a valid value to see, and used it to encode variable byte lengths (number of zeros: number of valid data bytes). Not smart, but easy to hack up, and gave me back 128-255 in one byte. I'm curious to check this out, now!


This method of compressing integer arrays in differences and then reducing the number of bits for each difference, is very efficient for many applications. We used it many years ago at a startup to compress and realtime render vector maps for mobile devices.

Not only made it the data really small (we could fit all the roads of a country like Germany in 5 MB), it was also blazing fast to decode. Our vector map renderer used to do the vector map decompression on the fly, while doing coordinate transformation, and rendering to the screen. That was really the trick to get realtime zooming and panning vector maps on ~100 MHz ARM CPUs with tiny data caches of the time.

We even took that compression scheme a step further from byte-bound to bit-granularity. Above article made me wonder, how our old scheme performs on today's CPU...


One of the problems with SIMD is that we can't do this kind of shuffle at bit granularity, at least not in one instruction.


if you want bit level granularity you can use either SIMD-BP128 or SIMD-FastPFOR from https://arxiv.org/pdf/1209.2137.pdf


This is indeed fast and simple, since the pshufb instruction allows unpacking of bytes into multiple zero-padded words in one step. This technique uses a precomputed table of shuffle patterns, so it's a short sequence of instructions to fetch control information for each 4-word sequence and apply the shuffle given.


"I should mention that unlike Amazon, we did not patent our approach. We want you to use this in your commercial and open-source projects!"

Thanks Dan!


Could there be a similar scheme for floats?

Also, is there a penalty for using float16s for CPU centric code?


Using fixed-point representation would allow using this compression method.


The example code should replace "datasource" with "databytes".


Na




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: