Hacker News new | past | comments | ask | show | jobs | submit login

This is interesting, but not for the reasons Jeff suggests. bzip uses all 8 cores for 21 minutes to produce a 986M file (or 2 minutes for 1092M), while 7zip doesn't use all 8 cores, and produces a file smaller than anything bzip can produce, in 5 minutes.

So it looks like 7zip is not just slightly better than bzip; it's much better. Ideally you can utilize all your cores by piping data from the DB directly into the compressor -- the compressor will use 2 cores (or whatever), and your database will use the rest.




Bzip2 is a very slow compression algorithm, mostly due, from what I recall, to the Burrows-Wheeler transform that lies at its core. LZMA is pretty much superior in every respect; it is (though quite slowly) on the road to replacing the aging Bzip2.

And as usual the comments on CodingHorror (at least the initial dozen or two) show a relative ignorance about the topic. 7zip (as can any compressor) can be trivially parallelized just by running it simultaneously on each solid block. The compression cost of a smaller solid block size is generally near-zero for the case where dictionary size << input data size.

The included Windows interface doesn't allow this kind of threading AFAIK, but it would be relatively simple to implement in an app using the LZMA libraries.


This is the exact approach taken by pigz (Parallel GZIP) - http://www.zlib.net/pigz/


> and uses the zlib and pthread libraries

Er, no, thanks. What about a good STL C++ implementation with OpenMP (automagic on STL.)

That's great for us, developers. We'll never run out of things to do :)


Whats the problem? pigz works as advertised for me. Almost a 4x speed increase on a 4 core system, assuming your storage I/O can keep up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: