Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

schwern (1528)

schwern
  (email not shown publicly)
http://schwern.net/
AOL IM: MichaelSchwern (Add Buddy, Send Message)
Jabber: schwern@gmail.com

Schwern can destroy CPAN at his whim.

Journal of schwern (1528)

Monday December 14, 2009
05:27 PM

gitPAN is complete!

[ #40021 ]

The gitPAN import is complete.

From BackPAN
------------
118,752 files
10,440,348,937 bytes (measured by adding individual file size)
21987 distributions (I skipped perl, parrot and parrot-cfg)

To git
------
21,766 repositories
4,495,204 bytes (measured by total disk usage
                                  after git gc with no checkout)
150 gigs on github (they have to index it)
12 days (lots of starts and stops)
1 laptop (1st gen Macbook)

I had to do it on a disk image because OS X's case-insensitive filesystem

I've written up a small FAQ. gitpan is reasonably stable, but you may have to rebase in the future.

Next, I take a break.

Then begins the second pass, mostly improving and adding tags. Here's the list of planned features. The second pass will be a rolling reimport of each distribution to bring everything up to the same standard, there was a lot of incremental improvements during the first pass. I expect this to be changes to commit logs and tags with very little content change.

The issue of PAUSE ownership I'm going to punt on. Its ugly and can be done entirely in parallel. If someone else makes available a historical distribution ownership database, gitPAN will use it.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • So now we just need to submit those 20,000 projects to Ohloh and other source metrics'y places? :)

  • Thank you for doing this. This was a huge project and I think it's awesome.
  • I've been meaning to put Perl-RPM onto git for a while, now, and been mostly held back by the fact that I never converted it from CVS to Subversion. Now that you have as much history as I could realistic need, is there an easy way to just outright seed a new repo from the gitPAN repo? Not just filling in holes like you did with David, but essentially cloning it (without it being treated as a repo clone)...

    --

    --rjray

    • Yes, you can clone it. There's nothing special about a clone. Just change where the "origin" remote points to (presumably to your own repository) and push.

      $ git clone git://github.com/gitpan/Perl-RPM
      $ cd Perl-RPM/
      $ git remote rm origin
      $ git remote add origin git@github.com:rjray/Perl-RPM.git
      $ git push origin master

      That should do it. This presumes you've created the Perl-RPM project on github.