Hacker News new | past | comments | ask | show | jobs | submit login




On multiple occasions I found lbzip2[0] to be faster than pbzip2

[0] http://lbzip2.org

Edit: here's a quick sample (file is read from tmpfs on a 4-core i7-5500U):

  $ time lbzip2 < linux-4.13.3.tar > /dev/null
  real    0m17.410s
  user    1m7.799s
  sys     0m0.283s
  $ time pbzip2 < linux-4.13.3.tar > /dev/null
  real    0m30.556s
  user    1m57.557s
  sys     0m2.169s


I have known situations where the compressed output of pbzip2 was not readable by some .NET library a customer was using (I'm afraid I can't remember which one). Fortunately an alternative multithreaded bzip2 implementation, http://lbzip2.org/, did not suffer from this problem so just in case...


If you're zipping multiple files, is it better to pbzip2 each file, or to parallel bzip2 them? You wouldn't want to parallel pbzip2, would you?


By compressing multiple (large) files sequentially you'd be able to gradually free some disk space much sooner, and less free space will be needed to write compressed data to.


Large files / sets of files with 7zip means you can use all files as a dictionary for the compression. (across-file compression, called "solid compression")

However, zip does not supports solid compression. Which creates the oddity of zip'ing twice can reduce your file size because multiple duplicate files may have the same compression but they are stored separately (and the second pass then notices the similarity).

The downside of solid compression is you have to extract any files related to that block. But with modern computers that's not as bad as it used to be, and modern 7zip doesn't extract "all" files, only the ones affected by that block.


True! This could be important if you're running low. I've been curious - does doing something like parallel bzip2 cause more fragmentation than a serial approach, or are the file systems and drives pretty good at dealing with heavy parallel write?


Also, with modern enough xz-utils you can just use `xz -T` for parallel compression.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: