Hacker News new | past | comments | ask | show | jobs | submit login

It would be good if zstd had a standardised a static web dictionary, like Brotli's https://www.rfc-editor.org/rfc/rfc7932#appendix-A. This would mean that the overhead for small files would be smaller, for boilerplate like <!DOCTYPE html>.



Brotli's baked-in dictionary always irked me because it's forever fixed and can't be updated (not that I'm implementing new hypermedias on a weekly basis but still). I'd much rather see broad adoption of plain 'Content-Type: zstd` implemented with no dictionary, and later go through a standards process to add named content-specific dictionaries like `Content-Type: zstd-web` or `zstd-cn` or whatever.

Edit: Actually this is already considered in RFC-8878 [0]. The RFC reserves zstd frame dictionary ids in the ranges: <= 32767 and >= (1 << 31) for a public IANA dictionary registry, but there are no such dictionaries published for public use yet.

[0]: https://datatracker.ietf.org/doc/html/rfc8878#iana_dict


Have you seen this proposal yet? It allows domains to define their own dictionaries for future compressions, with delta updates for changes.

Still seems a bit complicated to me, but could be meaningful for web apps that are required to be large.

https://github.com/WICG/compression-dictionary-transport


Somehow this sounds like another future vector of attack


Perhaps, but at the very least it's gone through multiple rounds of security and privacy review from different groups.


That kind of appeal to authority is myopic because all the major security and privacy issues are introduced by big companies with those teams. They are not that good at red team thinking which is what you need to do. It’s more expensive though and these teams are more about compliance and stopping the most obviously bad ideas only.


That's really not an example of appeal to authority. It's a simple statement of facts. This has passed reviews by Google, Facebook, Apple, Mozilla, and the W3C's security and privacy teams. Make of that what you will.


Or to put it more explicitly `Content-Type: zstd` should have a standard dictionary, since that's far easier to add to new proposals than something widely used.

The brotli dictionary appears to help with random text, not just html/css.


Content-Encoding?


Yes, thank you.


Possibly just use the one from Brotli since it's already standardized. If it's any good then the work is mostly already done, right?


They want it as a standard option (pre-shipped with all decompressors). Not something they could incorporate into a customized fork / client that uses zstd with their own extensions.


The zstd API does allow you to supply your own initial dictionary, so there's no need to fork it to allow a browser implementation to use the brotli dictionary.

Personally, as someone who doesn't work in web, I'm just as happy that zstd is flexible this way. For my applications, the brotli dictionary is pure overhead that bloats the library.


Again, since there's confusion.

They want _every zstd decompressor_ to __already have__ the dictionary in question so that it can be specified as part of the standard. E.G. 'instead of empty / an initial in file dictionary, use the standard dict #3' Such reference dictionary starts would be not be included in .zstd files, but would be shipped with the compressor source code.


There is an issue tracking this with a bunch of links to discussions about it, but they continue to not have time it seems.

https://github.com/facebook/zstd/issues/3100

This was the first place my mind went when I saw this Content-Encoding announcement, so I ran and re-checked the issue :(.


It's not a standard dictionary, but you can use a custom shared one. https://developer.chrome.com/blog/shared-dictionary-compress...


Yeah zstd is nice on its own, but a proper dictionary will give you several times the compression ratio on top of that, plus super fast decompression. It's just absurdly good for specialized data.


Do Linux distributions use a dictionary with their package manager? Since their packages are typically zstd compressed, every distro (version) release could have its own accompanying dictionary.


Dictionaries only help with really tiny files.

> Typical gains range ~10% (at 64KB) to x5 better (at <1KB).

https://www.manpagez.com/man/1/zstd/zstd-1.1.0.php

Files distributed by distros are unlikely to have many packages < 64kib so the advantages of a dictionary rapidly diminish on this use-case.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: