Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • "Development of this algorithm was sponsored by Intel"? :-)

    • It gets worse! Using PAQ [wikipedia.org], specifically paq8o8 [fit.edu] I get:

      $ time ./paq8o8 -5 perl-5.10.0.tar
      real    171m42.046s
      user    169m41.464s
      sys     0m48.303s
      $ ls -lh perl-5.10.0.tar.paq8o8
      -rw-r--r-- 1 acme acme 6.2M 2008-01-08 16:44 perl-5.10.0.tar.paq8o8

      And 30 minutes to decompress. Very small, but very very slow.

      • So, I started with over a hundred megabytes of tarballs from history.perl.org, and got those down to 6MB of git pack. Once into the Perforce history, I was looking at reducing the ~400MB of Perforce repository even further. After my initial export, it was already something like 250MB of Git pack (I wrote the exporter to make best use of on-the-fly delta compression). I left a fairly aggressive repack on it going, and it took about 30 minutes and left me with these packs [utsl.gen.nz], which are MUCH smaller. The decompression is slower, so some people would probably like to "unroll" their pack to be slightly looser if they were doing a lot of history mining.

        Git's compression is able to make a much better job of finding string matches than a straightforward stream compressor - for this reason, I often refer to stream compression as premature compression - as once you have two of these archives laid side by side, they might be able to be represented with 52% of the size that they can as compressed archives.