Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

acme (189)

acme
  (email not shown publicly)
http://www.astray.com/

Leon Brocard (aka acme) is an orange-loving Perl eurohacker with many varied contributions to the Perl community, including the GraphViz module on the CPAN. YAPC::Europe was all his fault. He is still looking for a Perl Monger group he can start which begins with the letter 'D'.

Journal of acme (189)

Sunday June 01, 2003
08:50 AM

CPAN Size (Take two)

[ #12543 ]
A while ago I commented that CPAN was about the same size uncompressed as it was compressed. I guessed that this was because most distributions were small so that compression didn't help much. Inspired by some detecting-charsets-using-compression talk on IRC, I added packed and unpacked size to CPANTS.

For example, Acme-Buffy-1.3.tar.gz takes 2,381 bytes compressed, 5,170 bytes uncompressed (the total of all the file sizes, assuming directories are free).

The top 10 biggest compressed packages on CPAN are (ignoring Perl and Parrot): Unicode-Unihan-0.02.tar.gz (4,513,673), Chart-2.2.tar.gz (4,405,514), Harvey-1.02.1.tar.gz (4,358,848), Tk-800.024.tar.gz (3,489,636), bioperl-1.2.1.tar.gz (3,488,040), bioperl-1.2.tar.gz (3,425,575), Tk800.015.tar.gz (3,330,861), Lingua-ZH-CCDICT-0.02.tar.gz (2,704,573), bioperl-1.0.2.tar.gz (2,645,781), bioperl-1.0.tar.gz (2,547,171).

The top 10 biggest uncompressed packages on CPAN are (ignoring Perl and Parrot): Harvey-1.02.1.tar.gz (93,673,151), Net-SCP-Expect-0.09.tar.gz (43,253,713), Chart-2.2.tar.gz (15,322,596), Tk-800.024.tar.gz (14,959,413), Tk800.015.tar.gz (14,217,659), Unicode-Unihan-0.02.tar.gz (12,924,673), bioperl-1.2.1.tar.gz (12,555,654), Lingua-ZH-CCDICT-0.02.tar.gz (12,508,332), bioperl-1.2.tar.gz (12,245,838), DBIx-DBStag-0.01.tar.gz (11,189,211).

Compression does help a lot in some cases. For example, Net::SCP::Expect has a 40M test file which consists of the words "This is the small file For use in testing Net::SCP::Expect only Delete at your convenience" over and over. The distribution itself compresses down to a mere 159,500 bytes.

Only one distribution is actually bigger than its packed version: Bundle-Tk_OS2src-1.00.tar.gz is 576 bytes packed, 548 bytes unpacked.

Does a high compressibility have any relation to how "good" a module is? Make your own mind up. The top 10 modules that compress badly: Bundle-Tk_OS2src-1.00.tar.gz (1.0511), Image-Magick-Thumbnail-0.01.tar.gz (0.9620), Wx-Sample-XS-0.01.tar.gz (0.9583), File-Find-Rule-MMagic-0.02.tar.gz (0.9461), StatsView-1.4.tar.gz (0.9290), File-Find-Rule-ImageSize-0.03.tar.gz (0.9283), Image-Maps-Plot-FromLatLong-0.1.tar.gz (0.9264), Bundle-Expect-1.09.tar.gz (0.9187), Image-Density-0.1.tar.gz (0.9088), HTTP-Lite-2.1.4.tar.gz (0.9054).

And the top 10 distributions that compress well: Net-SCP-Expect-0.09.tar.gz (0.0037), Class-Skin-0.05.tar.gz (0.0200), Acme-Ook-0.10.tar.gz (0.0315), Parse-Nibbler-1.10.tar.gz (0.0382), DBIx-DBStag-0.01.tar.gz (0.0426), Harvey-1.02.1.tar.gz (0.0465), CSS-1.01.tar.gz (0.0470), SuperPython-0.91.tar.gz (0.0499), Config-ApacheFormat-1.1.tar.gz (0.0600), EasyArgs-1.00.tar.gz (0.0633), Cisco-ShowIPRoute-Parser-1.01.tar.gz (0.0650).

What does this all mean? Dunno, don't ask me, I just make the numbers up ;-)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.