NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
oooh, smaller! (Score:2)
"Development of this algorithm was sponsored by Intel"? :-)
Re: (Score:2)
And 30 minutes to decompress. Very small, but very very slow.
For comparison, the complete history of Perl (Score:1)
So, I started with over a hundred megabytes of tarballs from history.perl.org, and got those down to 6MB of git pack. Once into the Perforce history, I was looking at reducing the ~400MB of Perforce repository even further. After my initial export, it was already something like 250MB of Git pack (I wrote the exporter to make best use of on-the-fly delta compression). I left a fairly aggressive repack on it going, and it took about 30 minutes and left me with these packs [utsl.gen.nz], which are MUCH smaller. The decompression is slower, so some people would probably like to "unroll" their pack to be slightly looser if they were doing a lot of history mining.
Git's compression is able to make a much better job of finding string matches than a straightforward stream compressor - for this reason, I often refer to stream compression as premature compression - as once you have two of these archives laid side by side, they might be able to be represented with 52% of the size that they can as compressed archives.
Reply to This
Parent