Qt multithreading http://doc.qt.io/qt-5/threads-technologies.html Aria for downl...

vthriller · on Sept 23, 2017

On multiple occasions I found lbzip2[0] to be faster than pbzip2

Edit: here's a quick sample (file is read from tmpfs on a 4-core i7-5500U):

  $ time lbzip2 < linux-4.13.3.tar > /dev/null
  real    0m17.410s
  user    1m7.799s
  sys     0m0.283s
  $ time pbzip2 < linux-4.13.3.tar > /dev/null
  real    0m30.556s
  user    1m57.557s
  sys     0m2.169s

gjvc · on Sept 23, 2017

I have known situations where the compressed output of pbzip2 was not readable by some .NET library a customer was using (I'm afraid I can't remember which one). Fortunately an alternative multithreaded bzip2 implementation, http://lbzip2.org/, did not suffer from this problem so just in case...

dahart · on Sept 23, 2017

If you're zipping multiple files, is it better to pbzip2 each file, or to parallel bzip2 them? You wouldn't want to parallel pbzip2, would you?

vthriller · on Sept 23, 2017

By compressing multiple (large) files sequentially you'd be able to gradually free some disk space much sooner, and less free space will be needed to write compressed data to.

katastic · on Sept 23, 2017

Large files / sets of files with 7zip means you can use all files as a dictionary for the compression. (across-file compression, called "solid compression")

However, zip does not supports solid compression. Which creates the oddity of zip'ing twice can reduce your file size because multiple duplicate files may have the same compression but they are stored separately (and the second pass then notices the similarity).

The downside of solid compression is you have to extract any files related to that block. But with modern computers that's not as bad as it used to be, and modern 7zip doesn't extract "all" files, only the ones affected by that block.

dahart · on Sept 23, 2017

True! This could be important if you're running low. I've been curious - does doing something like parallel bzip2 cause more fragmentation than a serial approach, or are the file systems and drives pretty good at dealing with heavy parallel write?

vthriller · on Sept 23, 2017

Also, with modern enough xz-utils you can just use `xz -T` for parallel compression.