Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Saturday September 06, 2008
01:21 PM

Cataloging BackPAN: MiniCPAN done in 9 hours

[ #37375 ]

My BackPAN indexer (YAPC::EU 2008 slides) made it's first complete pass through my MiniCPAN yesterday:

  • Distributions processed: 16039
  • Indexing failures: 782 (4.8%)
  • Run time: 9 hours (0.49 dists / sec)

The total size of BackPAN is about 100,000 distributions, so I think this means that I could index all of BackPAN in less than a week.

Right now I output everything as YAML, one YAML file per distribution. The data organization is sloppy and sometimes redundant because I haven't paid attention to it. You can get the tarball of all 16,000 files. Take a look to see if there might be anything else you'd want the indexer to record about a distribution. If you're interested in making some sort of CPAN service, let me do the work of cataloging the information you need.

If you want to play with this, get MyCPAN::Indexer from CPAN Search, or if you want to play with everything, checkout the sources from Github. You probably can't install in from CPAN since it depends on a couple of modules which only have developer releases right now.

The thing you'd want to play with is examples/backpan_indexer.pl. It's a little messy right now because I bolted on a Tk interface (see video one or two) that lives in examples/tk.pl and a dispatcher that lives in examples/steak.pl. My next step is to make those pluggable modules so you can note in the configuration file which interface and dispatcher you want, and as long as they have the right interface, they'll do whatever they do.

After a little bit more work on the indexing stuff, the next step is to take all of those YAML files and distill them into something that is easier to search, then hook up some sort of search interface to them. I'll probably first write a command-line tool (although with wonderful MVCness). I want to feed the index any file in @INC and get a report:

$ cpan_index `perldoc -l Foo`
Foo.pm's fingerprint found in Foo-Bar-0.05.tgz
    Author: Joe Snuffy (SNUFFY@cpan.org)
    Release date: Nov 11, 1998, 23:59:59
    Version: 0.05
    Latest version on CPAN: Foo-Bar-0.06.tgz
    Current maintainers:
        Joe Snuffy (SNUFFY@cpan.org)  (first come)
        Joe Cool (CAMEL@cpan.org)     (co-maintainer)
    Also came with:
        !!!Bar.pm, installed version 0.08 (does not match Bar.pm from Foo-Bar-0.05.tgz)
        ABC.pm, installed version 0.05 (matches ABC.pm in Foo-Bar-0.05.tgz)
    Depends on:
        Baz.pm from Baz-0.67.tgz
        Quux.pm from Quux-0.01.tgz
    CPAN Testers Matrix: ...
    Release history:
        0.01  Dec 31, 1969, 23:59:59  SNUFFY  (BackPAN)
        0.02  Jan 31, 1995, 23:59:59  SNUFFY  (BackPAN)
        0.03  Jun 6,  1996, 23:59:59  SNUFFY  (BackPAN)
        0.04  Oct 31, 1997, 23:59:59  SNUFFY  (BackPAN)
    ****0.05  Nov 11, 1998, 23:59:59  SNUFFY  (CPAN)
        0.06  Sep  5, 2008, 23:59:59  CAMEL   (CPAN)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • The Vimeo links don't work. I'm getting "Video not found".

    That being said, this is really great work.

    • I posted once I uploaded, and vimeo needs a chance to process them. They are a bit weird about when they actually make them available. The same links should work now.

  • By the way, yapc.tv's video went online [yapc.tv] this (European) night.

    • Woo hoo! Thanks!

      Now YAPC.tv needs a little badge that I can put on my web page next to the other details for the talk. I know it would be a lot of work, but a YAPC.tv logo in the corner of the video would be sweet too. :)

      • Each page now contains an example of HTML code which may be used to put it on person's website to link to the talk.

        • Ah, I meant a cool "YAPC.tv" logo that says "YAPC.tv". I was also thinking that having an overlay on the actual video that says "YAPC.tv" woudl be nice. Maybe people wouldn't like that on their videos, but I don't mind promoting the project since you did the work. :)