Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

acme (189)

acme
  (email not shown publicly)
http://www.astray.com/

Leon Brocard (aka acme) is an orange-loving Perl eurohacker with many varied contributions to the Perl community, including the GraphViz module on the CPAN. YAPC::Europe was all his fault. He is still looking for a Perl Monger group he can start which begins with the letter 'D'.

Journal of acme (189)

Tuesday January 08, 2008
09:32 AM

LZMA

[ #35330 ]
I've talked about fast compression before, but how about slow compression? Enter the Lempel-Ziv-Markov chain-Algorithm:

$ gunzip perl-5.10.0.tar.gz
$ cp perl-5.10.0.tar ..
$ time gzip -9 perl-5.10.0.tar
real    0m11.490s
user    0m11.405s
sys     0m0.088s
$ cp ../perl-5.10.0.tar .
$ time bzip2 -9 perl-5.10.0.tar
real    0m17.501s
user    0m16.857s
sys     0m0.300s
$ cp ../perl-5.10.0.tar .
$ time lzma -9 perl-5.10.0.tar
real    2m0.121s
user    1m58.735s
sys     0m0.468s

So it's slow. so what?

$ ls -lh perl-5.10.0.tar*
-rw-r--r-- 1 acme acme  12M 2008-01-08 13:18 perl-5.10.0.tar.bz2
-rw-r--r-- 1 acme acme  15M 2007-12-18 17:41 perl-5.10.0.tar.gz
-rw-r--r-- 1 acme acme 9.4M 2008-01-08 13:19 perl-5.10.0.tar.lzma

Ahhh, it compresses better. How about decompression?

$ time gunzip perl-5.10.0.tar.gz
real    0m2.014s
user    0m0.792s
sys     0m0.192s
$ rm perl-5.10.0.tar
$ time bunzip2 perl-5.10.0.tar.bz2
real    0m6.231s
user    0m4.916s
sys     0m0.252s
$ rm perl-5.10.0.tar
$ time unlzma perl-5.10.0.tar.lzma
real    0m2.093s
user    0m1.752s
sys     0m0.216s

LZMA compresses well and is pretty fast at decompression. Add another tool to your compression toolbox...

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • "Development of this algorithm was sponsored by Intel"? :-)

    • It gets worse! Using PAQ [wikipedia.org], specifically paq8o8 [fit.edu] I get:

      $ time ./paq8o8 -5 perl-5.10.0.tar
      real    171m42.046s
      user    169m41.464s
      sys     0m48.303s
      $ ls -lh perl-5.10.0.tar.paq8o8
      -rw-r--r-- 1 acme acme 6.2M 2008-01-08 16:44 perl-5.10.0.tar.paq8o8

      And 30 minutes to decompress. Very small, but very very slow.

      • So, I started with over a hundred megabytes of tarballs from history.perl.org, and got those down to 6MB of git pack. Once into the Perforce history, I was looking at reducing the ~400MB of Perforce repository even further. After my initial export, it was already something like 250MB of Git pack (I wrote the exporter to make best use of on-the-fly delta compression). I left a fairly aggressive repack on it going, and it took about 30 minutes and left me with these packs [utsl.gen.nz], which are MUCH smaller. The deco

  • I've long used 7-Zip when I'm forced to use a Windows system, but I've never used it's native 7z format (LZMA).

    From a quick scan of Wikipedia it seems that the 7z format is LZMA compression with a 64-bit header and optional extras and the plain lzma tool as described by you here is a raw LZMA compression stream. They are incompatible in that the two tools can't yet process each others files, which is a shame.

    I can see lzma files replacing bzip2 files in my archives now. How much smaller could CPAN be ma

    --
    -- "It's not magic, it's work..."
  • Have a look at <a href="http://ck.kolivas.org/apps/lrzip/">lrzip</a> which is a combination of LZMA and rzip. That is, it has a preprocessing stage sorting the data somehow and then does LZMA compression.

    It doesn't always compress tighter than LZMA but it's usually much faster.

    <pre><tt>
    % time lzma perl-5.10.0.tar

    real 3m33.665s
    user 3m31.538s
    sys 0m0.530s
    % ls -l perl-5.10.0.tar.lzma
    -rw------- 1 eda eda 10100884 2008-01-25 10:50 perl-5.10.0.tar.lzma
    % time lzma -d perl-5.10.0.t
    --
    -- Ed Avis ed@membled.com