Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

acme (189)

  (email not shown publicly)

Leon Brocard (aka acme) is an orange-loving Perl eurohacker with many varied contributions to the Perl community, including the GraphViz module on the CPAN. YAPC::Europe was all his fault. He is still looking for a Perl Monger group he can start which begins with the letter 'D'.

Journal of acme (189)

Tuesday January 08, 2008
08:32 AM


[ #35330 ]
I've talked about fast compression before, but how about slow compression? Enter the Lempel-Ziv-Markov chain-Algorithm:

$ gunzip perl-5.10.0.tar.gz
$ cp perl-5.10.0.tar ..
$ time gzip -9 perl-5.10.0.tar
real    0m11.490s
user    0m11.405s
sys     0m0.088s
$ cp ../perl-5.10.0.tar .
$ time bzip2 -9 perl-5.10.0.tar
real    0m17.501s
user    0m16.857s
sys     0m0.300s
$ cp ../perl-5.10.0.tar .
$ time lzma -9 perl-5.10.0.tar
real    2m0.121s
user    1m58.735s
sys     0m0.468s

So it's slow. so what?

$ ls -lh perl-5.10.0.tar*
-rw-r--r-- 1 acme acme  12M 2008-01-08 13:18 perl-5.10.0.tar.bz2
-rw-r--r-- 1 acme acme  15M 2007-12-18 17:41 perl-5.10.0.tar.gz
-rw-r--r-- 1 acme acme 9.4M 2008-01-08 13:19 perl-5.10.0.tar.lzma

Ahhh, it compresses better. How about decompression?

$ time gunzip perl-5.10.0.tar.gz
real    0m2.014s
user    0m0.792s
sys     0m0.192s
$ rm perl-5.10.0.tar
$ time bunzip2 perl-5.10.0.tar.bz2
real    0m6.231s
user    0m4.916s
sys     0m0.252s
$ rm perl-5.10.0.tar
$ time unlzma perl-5.10.0.tar.lzma
real    0m2.093s
user    0m1.752s
sys     0m0.216s

LZMA compresses well and is pretty fast at decompression. Add another tool to your compression toolbox...

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • "Development of this algorithm was sponsored by Intel"? :-)

    • It gets worse! Using PAQ [], specifically paq8o8 [] I get:

      $ time ./paq8o8 -5 perl-5.10.0.tar
      real    171m42.046s
      user    169m41.464s
      sys     0m48.303s
      $ ls -lh perl-5.10.0.tar.paq8o8
      -rw-r--r-- 1 acme acme 6.2M 2008-01-08 16:44 perl-5.10.0.tar.paq8o8

      And 30 minutes to decompress. Very small, but very very slow.

      • So, I started with over a hundred megabytes of tarballs from, and got those down to 6MB of git pack. Once into the Perforce history, I was looking at reducing the ~400MB of Perforce repository even further. After my initial export, it was already something like 250MB of Git pack (I wrote the exporter to make best use of on-the-fly delta compression). I left a fairly aggressive repack on it going, and it took about 30 minutes and left me with these packs [], which are MUCH smaller. The deco

  • I've long used 7-Zip when I'm forced to use a Windows system, but I've never used it's native 7z format (LZMA).

    From a quick scan of Wikipedia it seems that the 7z format is LZMA compression with a 64-bit header and optional extras and the plain lzma tool as described by you here is a raw LZMA compression stream. They are incompatible in that the two tools can't yet process each others files, which is a shame.

    I can see lzma files replacing bzip2 files in my archives now. How much smaller could CPAN be ma

    -- "It's not magic, it's work..."
  • Have a look at <a href="">lrzip</a> which is a combination of LZMA and rzip. That is, it has a preprocessing stage sorting the data somehow and then does LZMA compression.

    It doesn't always compress tighter than LZMA but it's usually much faster.

    % time lzma perl-5.10.0.tar

    real 3m33.665s
    user 3m31.538s
    sys 0m0.530s
    % ls -l perl-5.10.0.tar.lzma
    -rw------- 1 eda eda 10100884 2008-01-25 10:50 perl-5.10.0.tar.lzma
    % time lzma -d perl-5.10.0.t
    -- Ed Avis