Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I'm amazed by your finding that an uncompressed CPAN is only 13% larger than the compressed version. I would have thought that anything text based like a Perl module should compress very well, even with ZIP or tar.gz.

    I wonder what is taking all the space up and is uncompressible?

    I know in the cygwin [] world bzip2 [] is very popular, and I've wondered if going forward it would be useful for CPAN or future CPAN to support it as well, to squeeze a little more compression in.

    -- "It's not magic, it's work..."
    • thousands of tiny little files...

    • Elaine's right, and bz2 won't help here. Maybe you'd squeeze things down by another 1%. Maybe. There's a lot of overhead to small files--modern compression programs work better the larger their input, and perl modules just aren't that big. There's also a lot of uncompressable overhead in the tar file structure information.

      If you wanted to compress perl modules better, you'd want a denser file packing scheme than tar, and build a compression scheme that was prepopulated with a lot of the common perl substri
      • While I agree that bz2 or someother compressor isn't going to fix the problem, I do find that on a tar of text files, it's quite a bit more than 1% efficient than gzip.

        I can't comment on replacing the tar structure, but I've seen comments on it's weaknesses in other places too.

        I'm still amazed at how little compression there is in CPAN, the latest module I've uploaded for example, shrank from 90kb to 24kb with gzip (22kb with bz2). What is in there that doesn't compress?

        -- "It's not magic, it's work..."
        • Look at the size of many of the files on CPAN. I don't have the space to slurp the whole thing down for analysis, but a quick scan through shows that a huge number of the archives are tiny--less than 15K. Lots of them are less than 10K. Thats of a size where compressors just don't have enough to work with to make much of a difference, so it doesn't matter what compressor you're using, as there isn't enough there to compress at all usefully.

          It's not that the data on CPAN is oddly uncompressible. It's that t
  • To be fair... (Score:3, Informative)

    by belg4mit (967) on 2003.03.07 12:19 (#17797) Homepage Journal
    A module's source is not the size of a module.
    In particular, the size of a moderately complicated binary (XS module) is significantly larger than the source.

    Also, what if you only take the latest (or latest two) versions of any given module? A lot of authors haven't heard that BackPAN exists, and that the Master Librarian would like to see things under 700 MB.
    Were that I say, pancakes?
  • Top 10:
    124516  G/GR/GRAHAMC
    82364   J/JH/JHI
    69932   G/GS/GSAR
    63344   C/CN/CNANDOR
    31616   N/NI/NI-S
    31588   I/IL/ILYAZ
    28928   K/KR/KRISHPL
    25244   T/TI/TIMB
    24788   L/LD/LDS
    20228   B/BI/BIRNEY
    All of the above, though, have perl distributions (or documentation distributions).