Hacker News new | past | comments | ask | show | jobs | submit login
Chrome Feature: ZSTD Content-Encoding (chromestatus.com)
213 points by croes 10 months ago | hide | past | favorite | 112 comments



Part of my infrastructure uses compressed "dd" disc images. Last week I switched from using "xz" to "zstd" for them and "zstd -19" compresses almost as well as xz (1.8GB vs 1.7GB), but the threaded decompression is >10x faster (72s -> 6s).

I made this change coincidentally, the day before the xz compromise.


I've been nothing but happy with ZSTD. Really good compression trade offs with ratios close to the best of LZMA.

ZSTD 3 is nearly as fast as LZ4 while getting a much better ratio. Consiquentially, on my linux boxes I use either ZFS or btrfs and use zstd 3 as the default for the file system.


Nearly as fast to compress, but much slower to decompress. Zstd is pretty good on decompression but LZ4 is top-tier.

Zstd is often "fast enough" in decompression for most cases, though.


It's why I use LZ4 for ZRAM. Only at RAM speeds do LZ4's speeds matter.


Additional note: I run these on servers that are not very tall but are pretty wide: 40 1.8GHz cores or so. So multi-threaded decompression really helps. I was using xz before, which doesn't support threaded decompression (according tom man page), decided to switch to zstd rather than pixz.


zstd (tries to) link to liblzma by default.

https://raw.githubusercontent.com/facebook/zstd/dev/CHANGELO...

See v1.3.0

I compile it statically without xz/lzma support


(JiaT75 contributed to zstd as well as xz)




That looks like a(n unofficial) fork, not contributions to the official upstream repository.


I think there were some open PRs from that account, that got scrubbed from the account after the back door in xz was discovered. But probably nothing got merged.


Is it possible to retrieve the old branches?


While looking to see how it compares to brotli, I found an interesting conversation between a zstd contributor and the author of brotli

https://news.ycombinator.com/item?id=19678985


Zstd implementation got a lot better in the last five years.


Thank You. Because I remember Google rejected ZSTD at the time.


It would be good if zstd had a standardised a static web dictionary, like Brotli's https://www.rfc-editor.org/rfc/rfc7932#appendix-A. This would mean that the overhead for small files would be smaller, for boilerplate like <!DOCTYPE html>.


Brotli's baked-in dictionary always irked me because it's forever fixed and can't be updated (not that I'm implementing new hypermedias on a weekly basis but still). I'd much rather see broad adoption of plain 'Content-Type: zstd` implemented with no dictionary, and later go through a standards process to add named content-specific dictionaries like `Content-Type: zstd-web` or `zstd-cn` or whatever.

Edit: Actually this is already considered in RFC-8878 [0]. The RFC reserves zstd frame dictionary ids in the ranges: <= 32767 and >= (1 << 31) for a public IANA dictionary registry, but there are no such dictionaries published for public use yet.

[0]: https://datatracker.ietf.org/doc/html/rfc8878#iana_dict


Have you seen this proposal yet? It allows domains to define their own dictionaries for future compressions, with delta updates for changes.

Still seems a bit complicated to me, but could be meaningful for web apps that are required to be large.

https://github.com/WICG/compression-dictionary-transport


Somehow this sounds like another future vector of attack


Perhaps, but at the very least it's gone through multiple rounds of security and privacy review from different groups.


That kind of appeal to authority is myopic because all the major security and privacy issues are introduced by big companies with those teams. They are not that good at red team thinking which is what you need to do. It’s more expensive though and these teams are more about compliance and stopping the most obviously bad ideas only.


That's really not an example of appeal to authority. It's a simple statement of facts. This has passed reviews by Google, Facebook, Apple, Mozilla, and the W3C's security and privacy teams. Make of that what you will.


Or to put it more explicitly `Content-Type: zstd` should have a standard dictionary, since that's far easier to add to new proposals than something widely used.

The brotli dictionary appears to help with random text, not just html/css.


Content-Encoding?


Yes, thank you.


Possibly just use the one from Brotli since it's already standardized. If it's any good then the work is mostly already done, right?


They want it as a standard option (pre-shipped with all decompressors). Not something they could incorporate into a customized fork / client that uses zstd with their own extensions.


The zstd API does allow you to supply your own initial dictionary, so there's no need to fork it to allow a browser implementation to use the brotli dictionary.

Personally, as someone who doesn't work in web, I'm just as happy that zstd is flexible this way. For my applications, the brotli dictionary is pure overhead that bloats the library.


Again, since there's confusion.

They want _every zstd decompressor_ to __already have__ the dictionary in question so that it can be specified as part of the standard. E.G. 'instead of empty / an initial in file dictionary, use the standard dict #3' Such reference dictionary starts would be not be included in .zstd files, but would be shipped with the compressor source code.


There is an issue tracking this with a bunch of links to discussions about it, but they continue to not have time it seems.

https://github.com/facebook/zstd/issues/3100

This was the first place my mind went when I saw this Content-Encoding announcement, so I ran and re-checked the issue :(.


It's not a standard dictionary, but you can use a custom shared one. https://developer.chrome.com/blog/shared-dictionary-compress...


Yeah zstd is nice on its own, but a proper dictionary will give you several times the compression ratio on top of that, plus super fast decompression. It's just absurdly good for specialized data.


Do Linux distributions use a dictionary with their package manager? Since their packages are typically zstd compressed, every distro (version) release could have its own accompanying dictionary.


Dictionaries only help with really tiny files.

> Typical gains range ~10% (at 64KB) to x5 better (at <1KB).

https://www.manpagez.com/man/1/zstd/zstd-1.1.0.php

Files distributed by distros are unlikely to have many packages < 64kib so the advantages of a dictionary rapidly diminish on this use-case.


We have been live with zstd encoding for several weeks now and last week public traffic really started to use it. We're now at 50% brotli, 40% zstd, 9% gzip. Our payload is almost exclusively JavaScript and compression rates are 73% zstd (level 6), 72% brotli (level 4) and 65% gzip (level 4). However, for 95th percentile encoding speed zstd is 4.3ms, brotli is 2.1ms, and gzip is 0.1ms. At the 75th percentile they're all essentially the same. This is a Golang application and we use a cgo brotli library but a pure-go zstd library.


Are you unable to cache the zstd-compressed JavaScript in your app server (so as to save the 4.3ms of CPU on every request)? If not, are you sure the 4.3ms of CPU costs less than the 1% of bandwidth? I saw a linked discussion here with a Brotli author talking to a zstd author that started making me stare into these questions more.


We compile/build JS on the fly per-request so we wouldn't be able to cache the JS ahead of time.

The difference between brotli and zstd is 2.2ms at the 95th percentile, at lower percentiles the differences are much smaller. This is still an area of active investigation/development for us. There are possibly content sizes where one is better than another. There are also probably tweaks on the server-side to improve the encoding time.


Unrelated but I'm really curious what's the use case for sending custom js per request?


We offer a lot of customization in our product so we can eliminate whole blocks or code if we know the customer doesn't have x turned on. We also have several geo-based products and so we know we don't need to send GDPR-related code to users in the USA for example. The goal is to ship the minimal amount of code to the end-user.


That still doesn’t explain why you can’t cache. I would think that each code snippet itself is still cacheable. Or do you amalgamate all of them into a single download? Even still, I would expect most users to be fetching the same amalgamation, in which case caching should be beneficial…


We do combine everything into a single file. To clarify, we use browser caching but we can't utilize any traditional CDN because if a customer embeds host.com/js as a JS file that path might be different for user A vs user B.


I’d love to poke at this if you don’t mind. One advantage of downloading everything is that you only download it once. If the user is enabling/disabling features, you’ll have to then re-ship a new js bundle, no?


At the scale we operate at we would probably have billions, if not tens or hundreds of billions, of downloads before any feature status is changed. Typically our customers are not turning on and off features very frequently.


Also my thoughts. With something like Qwik you get tiny chunks that you only download when needed and are highly cacheable. All automated!


Not impossible for us to use something like that but remember that we are a vendor that sits on customer sites with very limited objectives we aren't powering a full-on webapp or multi-page experience. Typically we want to load our experience/component as quickly as possible after some sort of user interaction and the user interaction might have been from the customer's app/website.


Ah I see, that makes a lot of sense.


I fail to understand why this, now. For these minimal gains (compared to Brotli), servers and CDNs will need to increase memory and disk space to store the cached responses in zstd.


We don't use brotli for any responses under 25kb and in those cases, for us, zstd is the clear winner. We are still tweaking zstd and expect to improve the performance to bring it inline with brotli. Additionally, our biggest expense per month is egress, any savings goes a long way for us.


Try using cgo-based zstd package https://github.com/valyala/gozstd . It is usually faster than pure-Go package https://github.com/klauspost/compress/tree/master/zstd


Yep, this was validated earlier but deemed not worth the effort. We might revisit it down the road but we already deal with cgo for Brotli and it's a pain. We run Linux servers but Macs for development and so we have to keep the toolchain working across both and it isn't free.


A very happy moment for Yann no doubt, he mentioned Google's competing compression efforts directly impeding his work during a corecursive podcast interview ( https://corecursive.com/data-compression-yann-collet/ )

For the rest of us, another half-decade+ wait to get useful levels of compatibility especially on mobile


Simple: Brotli was initially shipped as a part of WOFF2 compressed font file format, and its creators foresaw more possible uses in the web browser, tailoring its built-in preset dictionary for web contents. It is no doubt that Zstandard is faster than Brotli, but both are sufficiently fast enough for web contents (100 MB/s and more), so an additional gain from Zstandard was much smaller. Given this, I believe Zstandard support in web browsers was possible only due to the strong push from Meta.


> For the rest of us, another half-decade+ wait to get useful levels of compatibility especially on mobile

Chrome already ships it with their currently released browsers, so you get a significant amount of mobile traffic (about 42%) from Android devices supporting it already. I don't know what the current status of Blink on iOS is, but once that releases, you'll also get iOS users.

WebKit has reacted positively to the proposal, though the Github issue documenting their position refers to an internal radar ticket so I have no idea what their timeline is.

If you build for Chrome today, compatibility will only grow. Chrome doesn't do compression levels above a certain point but I doubt Safari and Firefox will be affected by a different limitation when support eventually lands.

Plus, if your web server is configured well, you can have the gzip/brotli/zstd content available right next to each other, and just follow the browser's preference to pick a supported compressed asset.

There really are no downsides here.


If you “build for chrome today” you create a chrome only monopoly exactly as web devs did with IE.

Or you could build your site properly and have it work on more than a single browser.

You do you.


Other browsers will implement the feature soon. This isn't like webgpu or webusb, which just appeared before other browsers even reacted.

If Firefox and Webkit do introduce compatibility issues, you can always correct those later, though I doubt there will be any.


the web still works without content encoding, so you optimize for the common case but it still works for everyone


Because of HTTP content negotiation, you can serve zstd to Chrome users today, so “useful levels of compatibility” doesn’t make sense in this context.


Try saying it out loud with 1 TB to recompress and host for a fraction of users to save some CPU, because if we're talking CPU we're talking latency, and there's not much point in changing if you're running zstd --ultra --long -22 for every request


Sane choice, albeit delayed, as Zstandard has been the leading standard algorithm in the field for quite some time. I tested most of them for developing Pack [1] and Zstandard looked like the best alternative to the good old DEFLATE.

The dictionary feature [2] will help design some new ways of getting small resources.

[1] https://news.ycombinator.com/item?id=39793805

[2] https://facebook.github.io/zstd/zstd_manual.html#Chapter10


Who would have thought that Chrome can add new compression formats too as opposed to removing them!

This is nice – but please, bring back JPEG XL too.


I think ZSTD is a bigger jump in terms of perf/compression ratio to the existing alternatives to JPEG XL. At least that's the type of comment Mozilla seemed to be saying on the topic.


i hope they do, but i doubt it. they should be ashamed


It would be nice if CompressionStreams would also get brotli/zstd support. I wonder if the js engine uses the same implementation of the browser


I am surprised all major browsers hadn't implemented this years ago. Zstd is way better than gzip and brotli.


How much better is Zstd? Compression seems like one of those things where diminishing returns kick in very quickly. Whenever I need compression, it's always just been basically, "throw gzip at it because it does a lot without much of a performance hit". Basically it does the job.


Try this: time tar -cf /usr/share/doc |gzip |wc -c vs. time tar -cf /usr/share/doc |zstd |wc -c

repeat a few times to warm up your disk cache if needed. On mine host (with an nvme disk), zstd was about slightly better compression ratio than gzip, but took 1 second instead of 9 seconds to compress. Compare against something like lzop, which is about the same speed, but produces much worse compression.

Of course, with gzip if you have multiple cores you have the option of using pigz which bring the wall-clock time of gzip down to comparable to zstd and lzop.


> with gzip if you have multiple cores you have the option of using pigz which bring the wall-clock time of gzip down to comparable to zstd and lzop.

(But then you should use zstd -T0 for an apples to apples comparison.)


Thank you - that's a great, simple test.


For a benchmark on a standard set: https://github.com/inikep/lzbench/blob/master/lzbench18_sort... Of course, you may get different results with another dataset.

gzip (zlib -6) [ratio=32%] [compr=35Mo/s] [dec=407Mo/s]

zstd (zstd -2) [ratio=32%] [compr=356Mo/s] [dec=1067Mo/s]

NB1: The default for zstd is -3, but the table only had -2. The difference is probably small. The range is 1-22 for zstd and 1-9 for gzip.

NB2: The default program for gzip (at least with Debian) is the executable from zlib. With my workflows, libdeflate-gzip iscompatible and noticably faster.

NB3: This benchmark is 2 years old. The latest releases of zstd are much better, see https://github.com/facebook/zstd/releases

For a high compression, according to this benchmark xz can do slightly better, if you're willing to pay a 10× penalty on decompression.

xz -9 [ratio=23%] [compr=2.6Mo/s] [dec=88Mo/s]

zstd -18 [ratio=25%] [compr=3.6Mo/s] [dec=912Mo/s]


If you are not too much constrained in I/O rate, Zstandard has no match; it can easily clock more than 1 GB/s with a good compression ratio and also can automatically adapt to the changing I/O rate. Web browsers typically work with much less bandwidth available, though, so both Brotli and Zstandard are virtually identical in the clients.


It’s not so much about IO rate but access times.

If compression and decompression speed are critical, lz4. Zstd for pretty much everything else.

There are edge cases where compression time doesn’t matter but decompression time does. This used to be the case for game torrents back in the old days, and UHARC was used for that to great effect. Not sure what the current king is for that purpose.


Something like FreeArc would be used nowadays for repacks, typically - there's gains to be made by tailoring your procedure depending on the type of game asset, and some of those decisions won't apply to general compression.

FreeArc Next does actually use zstd as above but it also does a lot of tricks with compression, dictionaries, etc while taking much longer to process.

As an example, looking at FitGirl's COD:BO3 repack, 180GB->42.4GB entirely losslessly. Not sure how regular compression would fare on the original game, though.


One of the systems I maintain at work uses enormous, highly compressible, text files, which need to be compressed once and decompressed many times. Decompression speed isn't critical, it just needs to keep up with processing the decompressed data to avoid being the bottleneck. We optimize primarily for compression ratio.

For that system, we haven't found something that beats `xz -9`.


Brotli has a higher potential for the compression ratio due to its internal structure, so that edge case would be better served by Brotli than by Zstandard.


Hardly surprising when you consider gzip's DEFLATE was around in 1990 and developed by one guy as shareware and zstd was produced by a megacorp in 2015.


Zstandard was also mainly designed and produced by a single person, Yann Collet.


Yann also developed LZ4


Yes, but they also work for megacorp Facebook, and according to https://github.com/facebook/zstd/graphs/contributors, 300+ other contributors have made 4500+ commits to the zstd repo.

It's not quite as small-scale as 90's-style shareware was.


zstd has improved a lot since it was brought in under the facebook roof, but those are incremental changes, it already existed before then.

Here's the good old blog about that http://fastcompression.blogspot.com/


I recommend the CoRecursive episode about how LZ4 and zstd came to be. It started with making games for a HP calculator. https://corecursive.com/data-compression-yann-collet/


Yes it's worth taking a minute to appreciate how well DEFLATE and gzip has stood up over the years. It's a brilliant format and tool. I was definitely a little too ready to believe the paper that claimed gzip's ability to classify text was better than deep neutral nets, alas it has some limits after all!


Highly dependent on several factors.

I was comparing about ten compression algos for compressing json data. Mainly needing something reasonably fast and good compressing. zstd did well, but brotli absolutely crushed it at every metric. Of course, it's a data point of one, but it exists.


I implemented this in my server (https://substrata.info/). Works great. Results in smaller files than zlib/deflate. One thing to watch out for - compression levels >= 20 aren't handled by chrome clients.


Finally. I hope we get zstd into the Go standard library. It feels kind of strange that we have to fetch a third-party library for that.


Only if it's backed by security-reviewed and fuzzed asm. Go's gzip implementation is slow enough that klauspost is worth importing, following that pattern, I would probably still use klauspost's zstd even if Go had a slow reference implementation in the standard library.

What I'm really hoping for is something as seamless as gzhttp but for zstd. https://github.com/klauspost/compress/tree/master/gzhttp


I just contributed zstd decoding to httpx, my favourite python http library!


The second document has a link to a benchmark [1] by peazip which shows that 7zip is by far the best when you have a bit of time to compress (241s) but its the fastest at decompression and has the best ratio. Interesting. How is the support in languages? Are browsers planning to implement it as well?

https://peazip.github.io/fast-compression-benchmark-brotli-z...


7z is an archive format, it just uses LZMA2 compression I believe.


Well it's also an LZMA implementation, and an implementation can be fast or slow, good or bad.

From what I gather 7zip tends to be quite a bit faster than xz, since it's been worked on/optimized quite a bit more (I didn't test this myself to verify, but I've seen several people comment on that over the last few years).


I stan zstd hard. Hopefully now that browsers support it, HTTP middleware libraries will finally support it too. I already use it everywhere else.


There is some recent development activity on Firefox's zstd bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1301878


Here's the Microsoft .NET and ASP.NET feature tracking:

Add support for Zstandard to System.IO.Compression: https://github.com/dotnet/runtime/issues/59591

Support zstd Content-Encoding: https://github.com/dotnet/aspnetcore/issues/50643


Another important variable is reverse proxies.

You don’t necessarily need Rust or Node.js or Java to support zstd but you do need Traefic and nginx and haproxy to do so.


That's an excellent point.

Frustratingly I use YARP, so I still need MS to implement zstd hah.


For "realtime" compression, wouldn't lz4 make more sense?

Much lighter in both the server and the client.


LZ4 has quite poor compression, its advantage is speed. The difference between decompressing ten megs of data with LZ4 versus zstd is a completely negligible number when the difference in bytes sent over the wire is might approach a full megabyte.


Will zstd be added to the compression streams API for use by JavaScript as well?


Stupid question: Why do we keep adding things like this to the browser when they could be a library on the system?


I feel like I’m missing something here. How would having a library on the system help with serving zstd compressed assets?


Browsers benefit from having a deep integration into the http stack. You could build a browser around libcurl, but none of the majors are built like that.


as usual, I am always in the skeptic camp when chrome people add features. there's an unexplainable suspicion that they do it for EEE strategy like web integrity api. remember folks, if you love open Web, you'd advocate to use non-chrome browser.


They're implementing a widely used compression standard created by a developer at a competing company.

Even if Google is the absolute most evil company in the world, they will still make many smaller decisions every year that are just reasonable engineering. If you forget that, you end up going down a conspiracy rabbit hole and losing any ability to understand the world.


Or you can look at whether Firefox will add the same feature. They do.


This may not age well…


> Zstd is roughly three times faster than Brotli for decompression. Combined with zstd being faster at compression, this will result in faster page load times.


Can we expect xz support next?


They are having some problem compiling the test suite in Chrome.


No. LZMA-based formats like xz are too slow for web transfer speeds.


Isn't LZMA decompression pretty fast? For static assets its slow compression doesn't matter.


The cost of compression isn’t nothing. A lot of Linux distros have moved off of it for packages since they heavy Zstd compression was close enough on size while saving a lot of time & CPU bandwidth.


since the* heavy Zstd compression


hahaha




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: