Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

tsee (4409)

tsee
  reversethis-{gro.napc} {ta} {relleums}
http://steffen-mueller.net/

You can find most of my Open Source Perl software in my CPAN directory [cpan.org].

Journal of tsee (4409)

Friday September 05, 2008
09:19 AM

PAUSE on CPAN, indexing $stuff

[ #37362 ]

Most likely, everybody who reads this directly or indirectly depends on the operation of the PAUSE indexer (aka mldistwatch). The PAUSE indexer scans new distributions on the CPAN (really on PAUSE at that point) for the packages/namespaces and associated versions they contain, sends the uploader a friendly message with the results, and adds the information to the metadata that's used by our toolchain when people install modules from the CPAN.

The PAUSE code was written and is still maintained by Andreas König. It's a rather large and unquestionably a rather complex piece of software.

I don't think I'm giving anybody a big surprise if I say that being able to run this same indexer on a given tarball or zip offline may be useful for some toolchain modules. One example would be generating the META.yml provides section.

At the social event of YAPC::EU 2008, Andreas and a posse of other PAUSE admins, including me, sat down to talk about the directions our tools are heading as well as policies. I don't think anybody disagreed to that it'd be great to have components of PAUSE available individually from CPAN. But that's one ambitious goal!

A long time ago, I had spent a significant amount of time on porting the PAUSE indexer code in order to be able to index PAR distributions for injection into a PAR::Repository. I could do all sorts of simplifications for that purpose. For example, .par files are always in ZIP format, no tarballs, etc.
Last night, I decided to give it a shot at making the PAUSE indexer it's own CPAN module.

But I failed.

It turned out to be very, very tightly woven into the whole PAUSE code. I'm really not sure how I got the PAR file scanner to work on the basis of the PAUSE indexer. So I switched to a less ambitious goal: split the PAR indexer out of the PAR::Repository code into the PAR::Indexer module (and distribution) for general consumption.

That's where it stands today. For the future, I figure adding some code back into the mix and making a more generic indexer distribution would get it up to producing 98% of the same results as the real PAUSE indexer. I can do this, but:
Now I'd like to know, would you consider this useful?
And a challenge for all the testing gurus: How would you try to exhaustively test this thing agains the PAUSE indexer?

Cheers,
Steffen

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Some of my BackPAN stuff is pulling out bits of PAUSE to index things. Ultimately I want my BackPAN work to do the same job as PAUSE but for a different purpose: make index files for whatever people want to do. (But PAUSE also does all the user management stuff too).

    Besides that, I'd eventually like to get a shadow PAUSE running just so there's an extra one lying around ready-to-go.

    And, once I do that, I should be able to branch the code and see if there are ways that we can uncouple parts of it. Since we'l

    • That was precisely my impression: not easy to uncouple. Having a second copy of PAUSE running for our hacking pleasure and potentially as a fallback would be great.

      I'd be all for hacking on this together, but you're right: Having a development copy of PAUSE would be indispensable.

      • I did some ugly things [perl.org] which run the actual mldistwatch code using some monkeypatches and a mock DBI object, etc. Mostly it is a matter of emulating the environment. Perhaps the code could be refactored to make this easier (so you don't need all the uglies.)

        And wouldn't it be cool if the mock DBI object could actually be a frontend to query the actual PAUSE via a web service?! Then you could know what is really going to happen if you upload your code right now, etc.

        • Or how about having the data in a couple of SQLite files locally for testing?