Hacker News new | past | comments | ask | show | jobs | submit login
Why you should always use gzip instead of deflate compression (stackoverflow.com)
101 points by tbassetto on March 25, 2012 | hide | past | favorite | 17 comments



The "CRC-32 is slow" sentiment feels like a bit of a straw-man argument against gzip. With today's computing power, the difference is nigh-unto negligible: checksumming is dwarfed by the actual decompression, or maybe even the network overhead/latency.

Gzip is, doubtless, better for the reasons laid out in the article, but why aren't we moving forward? Do any browsers support LZMA or bzip2? Would they be at all worth the effort? I assume not in the case of HTML/resources, but maybe in raw non-compressed binary streams.


The largest assets will be already compressed (eg images and/or video). Http compression is only really benefiting for textual content. Gzip gives you a massive space saving over non compressed for textual data, but after that you're in diminishing returns. Although bzip2 is better than gzip, its not a lot better.


bzip2 compresses significantly better than gzip (I typically see a delta of between 10% and 30% in filesize) but it comes at a significant cost: compressing/decompressing the same file generally takes many times longer. xz compresses even better but takes even longer.

Also, some web servers (or at least Apache) will serve gzipped static content automagically if it finds a matching gzipped file. For example, if you have /var/www/html/foobar.css.gz, it will serve the contents of that file directly when the client requests http://example.com/foobar.css. So if you're running a stock Apache configuration, there's no reason not to just run gzip on all of your static assets and get lower bandwidth bills right away.


How many raw non-compressed binary streams do you need to deliver? Most assets a Web page requires are textual (CSS/JS) or already compressed: images, video, Flash content...


You need to consider that once you add something, you need to support it for the indefinite future. So we need to give serious consideration to things that are added. Does LZMA provide a sufficiently large compression ratio over gzip? Does the compression and decompression time introduce a lot of latency, especially on mobile devices? Does that increased latency mean plain text is faster?


That's why we have the Accept-Encoding header. Neither clients nor servers would need to support alternate compression schemes, either now or in the future. But if they both did, maybe there could be some benefit.


That's the problem, one major browser implements it, then they all need to. Then we have servers and browsers that become more and more complex, not that that isn't the case already. Thinking carefully about whether something is a good idea is important. Personally I think compressing data with LZMA or bzip2 will introduce a lot of latency, more then the time to send all the data unencrypted; then lets not talk about needlessly wasting server bandwidth to encrypt it (it all adds up). Then you have spdy, which implements compression at a protocol level.


It's not as simple as "forward" (and "backward" implied), the landscape has many more dimensions, cpu and memory and code size overhead etc.


Adler32 is very broken, but it doesnt really matter as you can use other checksums on top. http://www.leviathansecurity.com/blog/archives/16-Analysis-o...

Browsers should just stop supporting deflate, dont say they can in the request, and servers should drop it too. Ideally gzip support should be required not negotiable.


On the other hand, quality of Adler32 is hardly an valid argument here, as HTTP without compression does not use any checksums whatsoever. The fact that it's not consistently implemented is.

By the way, Adler32 I don't think that it's right to call Adler32 broken, as there are much worse algorithms in common use (TCP checksum, for example), and limitations of Adler32 are widely known and are not too relevant for this application.


Luckily once we all switch to IPv6 we will no longer have TCP checksums :-)


Good post, but it would be great if "for HTTP" was added to the title. This has nothing to do with other use cases.


It's interesting to learn about the web browser and server compatibility problems.

In the general case, I recommend one of two compression algorithms:

* lzo, for when compression or decompression speed matters most * lzma, for when file size matters most


LZO, though, has the potential problem or benefit (depending on your situation) of not being readily incorporable into proprietary software without navigating Oberhumer's commercial-license maze, unless there's some form of clean-room reimplementation around that I don't know about. (“LZO Professional” at http://www.oberhumer.com/products/lzo-professional/ claims to be available only as a “binary-only evaluation library” under NDA; I don't know whether “evaluation” in this context would obstruct use in an actual product.) The zlib license makes it much easier to deploy. liblzma seems to be in the public domain.


I think LZO has been obsoleted by Snappy anyway.


Even where the GPL is fine, LZO consists of some pretty scary code. Though at this point, I think probably enough people smarter than me have understood it and deemed it safe (it's part of the Linux kernel).

Snappy is fine as long as you can use compile C++ and link against the C++ standard library as it uses std::string for byte buffers for some strange reason. I can only assume it's mainly used for compressing strings at Google, but unless someone ports it to C or at least rips out the standard library dependency, that will preclude its use from some embedded systems, or the Linux kernel.

As a substitute for LZO or Snappy, I recently integrated the BSD-licensed LZ4 [1] into a project where no C++ standard library was available. Dependencies are minimal, memory use is predictable and the code is actually readable. Speed and ratios are comparable to LZO and Snappy.

[1] http://code.google.com/p/lz4/


lzma takes long enough to compress suitably large files that it's not always ideal for operations. While CPU time is generally cheaper than storage space, archiving systems don't always have the most powerful chips in them and tend to favor the cheapest storage possible.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: