Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

autarch (914)

autarch
  (email not shown publicly)
http://www.vegguide.org/

Journal of autarch (914)

Friday September 21, 2007
02:20 PM

Ego mining CPAN data

[ #34515 ]

The other day, I was wondering what percentage of CPAN I have sent patches for. I was kind of hoping for a nice impressive number like 1%.

I wrote a little script that takes a local CPAN mirror (courtesy of CPAN::Mini) and extracts the latest version of every module looking for my name or email address in files that look like changelogs. This obviously gets more than patches, since in some cases I just submitted a bug report or suggestion.

It's not quite perfect since some CPAN authors will say something like "applied patch from RT12345" without a name. I didn't want to fetch all those different tickets, since that'd take a long time.

So the list I came up with was this:

Apache::Compress, Apache::Filter, Apache::Session, App::Info, Catalyst::Action::REST, Class::Container, Class::Validating, CPAN, CPANPLUS, Data::Structure::Util, DateTime::Calendar::Chinese, DateTime::Calendar::Discordian, DateTime::Calendar::Hebrew, DateTime::Calendar::Julian, DateTime::Event::Recurrence, DateTime::Format::Duration, DateTime::Format::Natural, DateTime::Format::Strptime, DateTime::Incomplete, DateTime::Set, DateTime::Span::Birthdate, DBD::mysql, DBD::Pg, DBI, Devel::Cover, Email::Address, Exception::Class::TryCatch, ExtUtils::ModuleMaker, GD::SecurityImage, GraphViz, HTML::FillInForm, HTML::Tidy, IO::All, IPC::Shareable, Kwiki, Lingua::ZH::PinyinConvert, Log::Dispatch::Config, Log::Log4perl, mod_perl, Module::Build, Module::Signature, Net::SFTP::Foreign, Pod::Coverage, Set::Infinite, Spiffy, Spoon, Storable, SVK, SVN::Web, TAP::Parser, Test::Simple, Test::Taint, Thread::Pool, XML::Atom, XML::SAX::Expat, XML::SAX::Writer

It was fun to do this because I found a few cases where I'd totally forgotten having been involved.

This isn't quite 1%, closer to 0.5% (57 modules out of 12208). Of course, if you count the modules I've personally released, I end up with 97 modules, closer to 1% but still not quite there.

BTW, my original goal was to build a database of who patched what, but parsing out the bazillion ways someone says "patch from so-and-so" is really hard, and the RT thing is still a big problem. There's also a problem just figuring out identity, since people end up referred to in many ways, by full name, first name ("patch from Stas"), email address, and nicknames like CPAN ids or IRC nicks.

It'd be pretty cool to get that data, though, since we could see things like modules with the most patchers, most patches, most frequent patchers, etc. Maybe I'll get back to this sometime.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.