Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

schwern (1528)

schwern
  (email not shown publicly)
http://schwern.net/
AOL IM: MichaelSchwern (Add Buddy, Send Message)
Jabber: schwern@gmail.com

Schwern can destroy CPAN at his whim.

Journal of schwern (1528)

Saturday December 12, 2009
05:02 PM

gitPAN and the PAUSE index

[ #40014 ]

As you may or may not know, people on CPAN own modules (technically they own the namespace). Each Foo::Bar is owned by one or more CPAN accounts. Usually you gain ownership on a "first-come" basis, but it can also be transferred. Only the "official" tarball for a given namespace is indexed. So if the owner of Foo::Bar uploads Foo-Bar-1.23.tar.gz Foo::Bar will point at Foo-Bar-1.23.tar.gz. If I (presumably unauthorized) upload Foo-Bar-1.24.tar.gz the index will still point at Foo-Bar-1.23.tar.gz.

Here's the rub. Not owning a module doesn't stop you from uploading. It also says nothing about who owns the distribution. gitpan is by distribution. Now it gets a little more difficult to figure out who owns what. For example, look at MQSeries-1.30. All but two modules are unauthorized. BUT notice that MQSeries.pm is authorized. The CPAN index does point MQSeries at M/MQ/MQSERIES/MQSeries-1.30.tar.gz (everything else is at 1.29). Likely what we have here is a botched ownership transfer.

How do you mark that? search.cpan.org seems to take the strict approach, if anything's unauthorized its out. The CPAN uploads database I have available is the opposite, if anything is authorized its in. What to do?

Then there's stuff like lcwa. Looks like junk, but here's the thing. CPAN has a global module index to worry about, gitpan doesn't. Each distribution is its own distinct unit. So lcwa does no harm on gitpan, it can be recorded.

What does matter? The continuity of a distribution's releases, and this is precisely what CPAN does not track. It doesn't even have a concept of a distribution, just modules inside tarballs. CPAN authors playing nice with tarball naming conventions gives the illusion of a continuous distribution.

So... for a given release of a distribution (ie. a tarball), how does gitpan determine if the release should be included in the distribution's history? If we go strict, like search.cpan.org, we're going to lose legit releases and even entire distributions (like lcwa). If we let anything in gitpan is not showing an accurate history.

Add the complication that authorization changes. For example, the MQSeries module ownership will eventually be fixed. What then?

First pass through, gitpan is ignoring this problem. Its just chucking everything from BackPAN in. Second pass will rebuild individual repos with collected improvements. This is the first thing I'm not sure what to do about.

Suggestions?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • It seems to me historical PAUSE permission records are what you need. Given such, along with BackPAN, you could reconstruct the permissions for any given package at any point, and go from there.

    It might be conceivable to map authors to branches in the repos in some way if that makes sense.

    The question is: what does it mean if I upload an unauthorised release 1.2 of a module whose version 1.1 is on CPAN and authorised, and then that author releases 1.5? Did he use the changes from my 1.2, ie. is it one conti