Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

acme (189)

acme
  (email not shown publicly)
http://www.astray.com/

Leon Brocard (aka acme) is an orange-loving Perl eurohacker with many varied contributions to the Perl community, including the GraphViz module on the CPAN. YAPC::Europe was all his fault. He is still looking for a Perl Monger group he can start which begins with the letter 'D'.

Journal of acme (189)

Friday April 25, 2003
07:46 AM

CPANTS

[ #11845 ]
A long, long time ago, in the deep past, at YAPC::Europe 2001 in Amsterdam, Schwern presented an idea. Well, a couple of ideas. He bundled all these ideas up into one and called them CPANTS. You should probably read his synopsis. At the end of the conference he gathered volunteers and Jos came from out of nowhere and started CPANPLUS. Since then CPANPLUS has gotten much cleverer (it can now automatically send test results to CPAN testers). However, CPANTS as a whole hadn't magically emerged from the mailing lists and discussions.

At the German Perl Workshop this year I got started thinking about it with Jos, Nicholas and Thomas. Metadata is very useful. Metrics are useful. Kwalitee (note that it's not quality) is useful. So I thought for a little bit. I wrote a quick scripts to unpack all the modules on CPAN, and a couple of scripts to go through these modules. And decided the best place for metadata about CPAN would be, errr, on CPAN. And it got a little more complicated. Then Richard helped. And I ended up with Module::CPANTS.

For example, this is what Module::CPANTS knows about Acme::Colour:

  'Acme-Colour-0.20.tar.gz' => {
    'author' => 'LBROCARD',
    'description' => 'additive and subtractive human-readable colours',
    'files' => [
      'Makefile.PL',
      'README',
      'MANIFEST'
    ],
    'lines' => {
      'nonpod' => 170,
      'pod' => 95,
      'total' => 265
    },
    'requires' => [
      'Graphics-ColorNames-0.32.tar.gz',
      'Scalar-List-Utils-1.11.tar.gz',
      'Test-Simple-0.47.tar.gz'
    ],
    'requires_module' => {
      'Graphics::ColorNames' => 0,
      'List::Util' => 0,
      'Test::Simple' => 0
    },
    'requires_recursive' => [
      'File-Spec-0.82.tar.gz',
      'Graphics-ColorNames-0.32.tar.gz',
      'Scalar-List-Utils-1.11.tar.gz',
      'Test-Harness-2.26.tar.gz',
      'Test-Simple-0.47.tar.gz'
    ],
    'testers' => {
      'fail' => 1,
      'pass' => 5
    }
  };

Note that it gets the description, even though the module isn't in the modules list (only a quarter of the distributions on CPAN are), lines of code metrics, prerequisites (and recursive prerequisites), CPAN testers data, and a list of notable files. A whole bunch of metadata.

Before you start complaining: it's just the beginning. Module::CPANTS will never do all that the CPANTS synopsis mentions. No, it doesn't have many metrics (I don't want to evaluate all the Perl code on CPAN, for one). And the interface is, well, a really big hash. Adding new metrics is easy (you may need a little disk space to test it ;-). Patches are welcome. The kwalitee() method is uncoded as yet, because I think most people want to weight metrics in different ways. I plan on releasing a version of Module::CPANTS every week, so it's only ever a couple of days out of sync with CPAN, although you can shout "database" and "web service" at me if you want.

Is this interesting? I think so. There are a whole lot of extra metrics that could be added. And if we set up a secure sandbox of some sort so that it's safe to evaluate all that Perl code, a couple more metrics. What easy-to-calculate metrics influence what you think of a Perl module? Want to help out? Check out Module::CPANTS::Generator.

And no, it's not just a plan to create the largest module on CPAN...

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • metrics (Score:2, Interesting)

    Schwern's synopsis [develooper.com] has so many good ideas in it. I think you are on to something starting out with something relatively simple and then moving forward with new stuff as it becomes useful.

    I'd be intersted in a couple of metrics, some of which might be easy to calculate.

    • Last update: the date of the last release on CPAN, which could be used as a measure of how actively the module is being maintained.
    • Versions: a list of how many versions the module has been through to get an idea of the module's lifecycle
    • Thanks for the tip. It should be easy to get the time of last update from CPANPLUS. Getting the number of versions is a little harder - it should be possible to get it from backpan [develooper.com], but it'd be much easier to get if there was an index listing of it all. RT seems possible too. I wonder if there's a way to get at a full RT database dump...

      The main problem is getting access to this data without doing a billion web lookups. For the testers information, I use the NNTP interface to the testers mailing list, whi

      • Rather than pulling all the data every couple of days and putting it into a module, have you considered turning your Module::CPANTS::Generator module into the main module itself? Have it just pull the data on demand. No reason to go pull all the information from the "Meta" module each day if noone is ever going to use it.

        --
        J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
        • I did consider that, but I figure the meta information is import enough to have before you go to the trouble of downloading the module. For instance, it's nice to know prerequisites for a module without having to download the module and its prerequisites and their prerequisites. This way you get metrics for everything on CPAN without having to unpack the whole of CPAN. Disk space is cheap, but not everyone has the bandwidth and a couple of gigs free.
  • Here it is then [caseywest.com].

    Now, I just need to find someone to maintain it. Being emailed to acme for safe keeping.
    --
    Casey West