Modern LZ Compression (2019)

retrac · on Aug 18, 2021

There's quite a lot of retro modern LZ activity too! LZ turns out to be amazing on old machines, often only a couple times slower than a block copy. Optimal compressors and control over the algorithm have led to some very tight demos.

https://www.brutaldeluxe.fr/products/crossdevtools/lz4/index... LZ4 Data Compression - a rather long and in-depth article looking at LZ4 on the 65816 for the Apple IIgs with a decompressor that exploits that processor's block copy.

https://github.com/emmanuel-marty/lzsa - LZSA - a LZ4-like modern LZ that's more efficient both in speed and compression to LZ4 (at least on the 8 bitters it targets) - includes a neat chart of speed/compression trade-offs on a ZX Spectrum with a variety of algorithms

dang · on Aug 18, 2021

Discussed at the time:

Modern LZ Compression - https://news.ycombinator.com/item?id=19064791 - Feb 2019 (4 comments)

nathell · on Aug 18, 2021

"Managing Gigabytes" by Witten et al is _the_ book that got me into the field of compression back in the day. Granted, it’s a bit dated now as there’s been a lot of progress in the field, but I’d still heartily recommend it to newcomers.

nickdothutton · on Aug 18, 2021

Only partly related, since it deals with a specific type of data (English language Wikipedia), some of you might enjoy reading about the Hutter Prize: https://en.wikipedia.org/wiki/Hutter_Prize

akkartik · on Aug 18, 2021

Does anyone have recommendations for learning the xz format? Particularly as used by the ZIM archival format: https://en.wikipedia.org/wiki/ZIM_(file_format)

adzm · on Aug 18, 2021

lzmautils / xz utils is probably your best bet. They have some documentation for sure but it's a complicated format so the source is invaluable.

nayuki · on Aug 18, 2021

> First, we just shorten any symbols that are longer than our maximum code length — 11 — to that value. This means that our tree will no longer be a Huffman tree, so we need to do a fixup pass to redistribute the error we've introduced.

This can in fact be solved directly and optimally: https://en.wikipedia.org/wiki/Package-merge_algorithm ; https://en.wikipedia.org/wiki/Huffman_coding#Length-limited_...

pwrrr · on Aug 20, 2021

I've spend hours trying to understand how you apply this theory. The internet is surprisingly absent of actual examples showing how it's done. The best I've found that explains the packag-merge is this page: https://create.stephan-brumme.com/length-limited-prefix-code...

pwrrr · on Aug 18, 2021

I just made a huffman encoder on the c64, for fun. I need understand how you go from the variable length codes of huffman, to suddenly fixed lenght codes, because you don't want the codes to be above a certain length. hmm...

meiji163 · on Aug 18, 2021

I wonder if LZ would still be standard, if not for the inertia of gzip/zip? There are surely better and comparably fast algorithms (paq, ppm, etc.)

hakmad · on Aug 18, 2021

This might not apply to all LZ based programs/algorithms but compress (an old Unix utility based on the Lempel-Ziv-Welch algorithm) fell out of favour after it ran afoul of patent issues (you can read more about it on the compress Wikipedia page [0] and a section of the GIF Wikipedia page [1] which covers more about the enforcement of the patent). From what I can gather though, it enjoyed considerable success and was pretty much the standard utility for compressing data on Unix. I think eventually though it would have been replaced because other algorithms (bzip2, gzip, etc.) have slightly better compression ratios (even if they are more computationally expensive).

[0] - https://en.wikipedia.org/wiki/Compress [1] - https://en.wikipedia.org/wiki/GIF#Unisys_and_LZW_patent_enfo...

jltsiren · on Aug 18, 2021

LZ is popular because it's computationally efficient. When random memory access is slow but sequential access is fast, you want to encode substrings instead of individual symbols. Explicit higher-order probabilistic models require a lot of memory and generate too many cache misses. Implicit models based on the Burrows-Wheeler transform also have poor memory access patterns.

zinekeller · on Aug 18, 2021

> if not for the inertia of gzip/zip

I think that's misidentifying where's credit is due: LZ might not be the most efficient compression format when it comes to size, but it's very efficient computation-wise when compressing and decompressing, and this still holds true today even when you consider newer algorthms like Zstandard (which uses a derivative of LZ). Sure, if you need to archive data into its compactest representation at any cost, LZ and derivatives won't deliver it, but unless there's a significant change to how computing works it's still in LZ's favour. At the time DEFLATE was designed, LZ just beat out every competition since realistically you can't run decompression at real-time using minimal memory.

wolf550e · on Aug 18, 2021

ppm and paq are a LOT slower than LZ77.

hexxagone · on Aug 18, 2021

PAQ and PPM are not fast, certainly not in the same category as LZ codecs.

axiolite · on Aug 18, 2021

zip/gzip are LZ algorithms, as mentioned near the top of the article. Based on LZ77 to be exact.

Did you perhaps mean LZW (gif & compress), which was based on LZ78?

zlib isn't really dominant, today. lzma seems to have overtaken it for anything destined for public distribution.

wolf550e · on Aug 18, 2021

lzma is slow to decompress. Unless the bandwidth / storage costs are the primary concern, zstd has good enough compression ratio and the very fast decompression makes everything much nicer. Built-in threaded compression, long range compression, binary diff, dictionary compression, etc. are a bonus.

axiolite · on Aug 18, 2021

> zstd has good enough compression ratio

I never understood zstd. It's basically lzma+gzip+lz4 packed together. Show me any zstd level that has significantly different speed and data size than one of the levels of those three can't match.

lifthrasiir · on Aug 18, 2021

> Show me any zstd level that has significantly different speed and data size than one of the levels of those three can't match.

In case you are not just trolling, I had some very large JSON file with a large amount of text in my SSD around and the compression time was as follows [1]:

    8,826,654,133       original
    4,763,212,322  0:29 lz4 -1
    3,815,508,500  0:52 brotli -1
    3,715,002,172  2:09 lz4 -3
    3,668,204,232  2:27 gzip -1
    3,159,659,113  1:59 brotli -2
    3,118,316,529 10:55 lzma -0 -T4
    3,025,746,073  1:21 zstd -3 -T1 (default)

Zstandard is not just lzma+gzip+lz4. It is better than everything you've mentioned at least for my test case. In fact I first compressed with zstd and tried to match the compression time for others, because zstd is very fast and using anything else as a reference point would have taken me much more time. It does have enough reason to claim itself to be "standard".

[1] Tested with i7-7700 3.60GHz and 48 GiB of RAM (but no algorithm tested use more than several MBs of memory anyway). I'm using a pretty fast SSD here so I/O speed is not much of concern. Also note that every algorithm except for lzma is single-threaded.

axiolite · on Aug 18, 2021

Sounds like you're comparing multi-threaded zstd to other utilities that are single-threaded (e.g. gzip instead of pigz).

I duped /usr/share/dict/words into an 8GB file and did a couple tests on the old system I'm on:

  8589934592  original
  3075315128  pigz -1       1m0.44s
  2926825272  zstdmt -3     2m37.52s
  2877549999  pigz -3       1m28.94s

lifthrasiir · on Aug 18, 2021

Zstd by default is single-threaded. Just in case though...

    8,826,654,133       original
    3,033,695,892  0:22 zstd 1.3.3 -3 -T4
    3,025,746,073  1:21 zstd 1.3.3 -3 -T1 (default)
    3,017,972,162  1:05 zstd 1.5.0 -3 -T1 (default)
    3,013,860,663  1:23 zstd 1.5.0 -3 --single-thread

It is just no match.

EDIT: axiolite wanted to see --single-thread and while my Linux box only has an older zstd (1.3.3) without that option I realized I do have a Windows executable for recent zstd (1.5.0). Both executables have run in the same machine but I can't guarantee anything about the 1.5.0 binary.

axiolite · on Aug 18, 2021

zstd is NOT single-threaded by default. It's dual threaded. You have to pass the --single-thread option to make it single threaded.

I think you need to try pigz...

lifthrasiir · on Aug 18, 2021

--single-thread probably doesn't do what you think, it forces zstd to serialize I/O with the compression job and it doesn't mean compression itself is multi-threaded with -T1 [EDIT]. I can't try that anyway because my zstd version is slightly lower (1.3.3), but I can try pigz:

    3,664,854,169  0:42 pigz -1 -p4
    3,664,854,169  0:28 pigz -1 -p8 (default, also probably the max possible with my box)

Now I'm very curious where you got your copy of zstdmt. (I'm using stock Ubuntu packages for all of them.)

[EDIT] Was "it doesn't make compression itself multi-threaded", which can falsely imply that --single-thread seemingly enables multi-threaded compression but it doesn't ;)

axiolite · on Aug 18, 2021

Using stock RHEL7/EPEL packages: zstd-1.5.0-1.el7.x86_64 On an old Athlon II X4 615e right now.

The difference in our results certainly is curious.

lifthrasiir · on Aug 18, 2021

Very interesting. It might be the case that zstd optimizes more for recent machines; zstd famously uses four different compression streams to maximize instruction-level parallelism and that might not work well in older machines. I haven't seen any machine where zstd is significantly slower than it should, but those machines I could test came from 2013 or later. Or either the RHEL package might have been optimized for recent machines. It would be interesting to test a binary optimized for the current machine (-march=native -mtune=native).

hexxagone · on Aug 18, 2021

pigz is no match for zstd...

dnr · on Aug 18, 2021

Isn't that exactly the point, and why it's so great? It's a single algorithm that's tune-able across that wide range of speed/ratio trade-offs using a single parameter. So you can just use one thing for almost every application instead of picking between three different things.

(Yes, I know that one parameter controls multiple different settings internally, so there are multiple dimensions if you're willing to dig that deep.)

Anyway, my recollection from looking at benchmarks a while ago: zstd used at similar ratios to lzma compresses in similar time but decompresses much faster, and it's also faster than gzip when set to comparable ratios to that. lz4 is still faster than the fastest zstd modes, and lzma at the most extreme settings still gets better ratios that the best zstd can do. But there's a huge wide swath in the middle where zstd beats them all, and that's quite valuable.

axiolite · on Aug 18, 2021

> So you can just use one thing for almost every application instead of picking between three different things.

I honestly can't see how remembering several ranges of zstd levels and their time/speed trade-off is any easier than remembering lz4, pigz, xz, which all happen to have good default setting.

ahofmann · on Aug 18, 2021

I use zstd once a year, and it is very easy: 1. the default (3 on my machine) is fast and compresses well. 2. if you need to get the best compression level for your use case, use the benchmark "zstd -b1 -e15 yourfile" and after a few minutes you have your answer.

I can't see why I should ever in my life use another program for compressing things than zstd.

Jasper_ · on Aug 18, 2021

It's really not lzma+gzip+lz4, except by if you mean "it's an LZ-alike". It's a modern LZ with repcodes (kinda LZMA-ish, but more like LZX honestly), and it uses a novel arith coder system (Huffman/ANS) instead of a range/arith coder (LZMA) or none (LZ4)

wolf550e · on Aug 18, 2021

Here are some DB dump compression tests done years ago, look at the pareto frontier and see that zstd wins between LZ4 and LZMA (xz): https://zeevt.github.io/compress_pareto.html

rurban · on Aug 18, 2021

I had to implement recently an oldstyle lz77 en/decoder to handle an old fileformat, and it was surprisingly simple. Even the encoder

user-the-name · on Aug 18, 2021

"Modern", but uses Huffman coding? No LZ implementation aiming for high compression in the last decade has used Huffman.

hexxagone · on Aug 18, 2021

You mean like zstd and brotli ? Is there any new LZ codec not using Huffman ?

retrac · on Aug 18, 2021

It was my understanding Zstd used neither Huffman or AC but something else: https://en.wikipedia.org/wiki/Asymmetric_numeral_systems

Edit: It uses a variety of entropy encodings for different data structures, Huffman is one of them. My apologies for the confusing initial comment.

chriswarbo · on Aug 18, 2021

> For our goal of a high compression LZ variant, we will want to Huffman code our symbols (zstd actually uses FSE, a variant of arithmetic coding to do this, but we will cover that in a future post).

user-the-name · on Aug 18, 2021

But why? Why spend that much effort introducing a method that is basically entirely outdated at this point, and is also fairly complicated?

hexxagone · on Aug 18, 2021

Huffmann is rather simple and hard to beat in decompression speed.

vardump · on Aug 18, 2021

For high entropy data (=somewhat random data), FSE is quite comparable to Huffman in compression speed.

For low entropy (lots of high probability symbols, like zeroes), Huffman is about 2-3x faster. But on the flipside, FSE achieves markedly higher compression ratio.

I think FSE is worth the speed tradeoff vs Huffman in most cases.

hexxagone · on Aug 18, 2021

"FSE achieves markedly higher compression ratio". I do not thin it is true, FSE/ANS achieves slightly better ratios in general. Zstd uses both Huffman (for large alphabets) and FSE (for small alphabets).

user-the-name · on Aug 18, 2021

Arithmetic coding is much, much simpler. And decompression speed is not a limiting factor in most applications of data compression like this.

hexxagone · on Aug 18, 2021

"Arithmetic coding is much, much simpler." Let us agree to disagree. "And decompression speed is not a limiting factor in most applications of data compression like this". It depends on the application. Zstd and Brotli are certainly aiming at the fastest decompression speed possible.

user-the-name · on Aug 18, 2021

Arithmetic coding can be implemented in as little as maybe ten lines of code. It is far simpler than Huffman coding.

hexxagone · on Aug 19, 2021

The Huffman encoding loop is 2 lines and decoding loop is 4 lines of branchless code. Do you have an example of branchless arithmetic encoder or decoder ?