Hacker News new | past | comments | ask | show | jobs | submit login

It might be that GZIP isn’t actually a good format with which to try to compress this data. I would think a compression algorithm that expects a rather large 8 byte character space wouldn’t be very suitable for a 4-bit space



I got a little nerd sniped by this. I wrote a program that takes an input (/usr/share/dict/words on Debian 12, specifically), and compresses it with gzip and Zstandard, and also expands its binary representation (by replacing each 1 bit with A and each 0 bit with B) and compresses that. The result is:

    gzip    expgz   zstd    expzstd original
    263120  421278  252449  389004  985084
So unlike in the article, expansion + compression is at least better than the original. The ratios are (smaller # better; the opposite of how most compression algorithms advertise their performance, but what the original article used): 0.27, 0.42, 0.25, 0.39. gzip and Zstandard aren't a lot different in either case. Whatever patterns that the Javascript weirdifier uses is less obvious to compression algorithms in general than just bitwise substitution.

Here's the program if you want to look for bugs: https://go.dev/play/p/wwNXVzO2TO-




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: