Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I'm amazed by your finding that an uncompressed CPAN is only 13% larger than the compressed version. I would have thought that anything text based like a Perl module should compress very well, even with ZIP or tar.gz.

    I wonder what is taking all the space up and is uncompressible?

    I know in the cygwin [] world bzip2 [] is very popular, and I've wondered if going forward it would be useful for CPAN or future CPAN to support it as well, to squeeze a little more compression in.

    -- "It's not magic, it's work..."
    • thousands of tiny little files...

    • Elaine's right, and bz2 won't help here. Maybe you'd squeeze things down by another 1%. Maybe. There's a lot of overhead to small files--modern compression programs work better the larger their input, and perl modules just aren't that big. There's also a lot of uncompressable overhead in the tar file structure information.

      If you wanted to compress perl modules better, you'd want a denser file packing scheme than tar, and build a compression scheme that was prepopulated with a lot of the common perl substri
      • While I agree that bz2 or someother compressor isn't going to fix the problem, I do find that on a tar of text files, it's quite a bit more than 1% efficient than gzip.

        I can't comment on replacing the tar structure, but I've seen comments on it's weaknesses in other places too.

        I'm still amazed at how little compression there is in CPAN, the latest module I've uploaded for example, shrank from 90kb to 24kb with gzip (22kb with bz2). What is in there that doesn't compress?

        -- "It's not magic, it's work..."
        • Look at the size of many of the files on CPAN. I don't have the space to slurp the whole thing down for analysis, but a quick scan through shows that a huge number of the archives are tiny--less than 15K. Lots of them are less than 10K. Thats of a size where compressors just don't have enough to work with to make much of a difference, so it doesn't matter what compressor you're using, as there isn't enough there to compress at all usefully.

          It's not that the data on CPAN is oddly uncompressible. It's that t