Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

jdavidb (1361)

jdavidb
  (email not shown publicly)
http://voiceofjohn.blogspot.com/

J. David Blackstone has a Bachelor of Science in Computer Science and Engineering and nine years of experience at a wireless telecommunications company, where he learned Perl and never looked back. J. David has an advantage in that he works really hard, he has a passion for writing good software, and he knows many of the world's best Perl programmers.

Journal of jdavidb (1361)

Wednesday October 16, 2002
11:27 AM

Suggestions for merlyn's mirror CPAN program

[ #8410 ]

I've been running my CPAN.pm shell sessions with a file:// URL for awhile now, thanks to merlyn's recent CPAN mirror program (which pulls the bare minimum to create a usable CPAN repository: only the most recent modules). I noticed today though that CPAN.pm is still going to my previous first choice repository to download and compare checksums.

So, I added the following:

  • Declare an %authors hash just before the while gzreadline loop
  • After calling my_mirror on the module distribution file, get just the directory name with dirname, and add it to the hash as a key
  • For each key in the hash, call mirror (not my_mirror) on "authors/id/$key/CHECKSUM"

Here's the actual patch, but you might prefer just the description:

--- mirrorcpan.perl.old    2002-10-16 11:03:32.000000000 -0500
+++ mirrorcpan.perl    2002-10-16 11:19:58.000000000 -0500
@@ -47,6 +47,7 @@
+"rb")
   or die "Cannot open details: $gzerrno";
my $state = 1;
+my %authors;
while ($gz->gzreadline($_) > 0) {
   if ($state == 1) {        # in header
     $state = 2 unless /\S/;
@@ -59,6 +60,18 @@

   my ($module, $version, $path) = split;
   my_mirror("authors/id/$path");
+  my $authordir = dirname $path;
+  $authors{$authordir} = 1;
+}
+
+foreach my $authordir (keys %authors) {
+  my $path = "authors/id/$authordir/CHECKSUMS";
+  my $source = URI->new_abs($path, $REMOTE)->as_string;
+  my $dest = catfile($LOCAL, $path);
+  mirror($source, $dest);
+  # we use mirror instead of my_mirror because my_mirror presumes a
+  # file is up to date if it exists, but CHECKSUMS will change
+  # contents but not names
}

## finally, clean the files we didn't stick there

This is being tested even as I speak. It may not work for you. It may not work for me. It may only work the first time I run it. :)

update: Actually I screwed up and tested a version that called my_mirror instead of mirror. Had a misunderstanding in the call semantics of my_mirror that made the version above not work (fixed). Of course, now you're making connections to check each CHECKSUMS file to see if it changed, which is pretty lame. Maybe someone can come up with a better idea.

Question: why are the my_mirror and clean_unmirrored subroutines wrapped in a BEGIN block? I understand why they are in a block, but why does it have to be compiled first?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I noticed today though that CPAN.pm is still going to my previous first choice repository to download and compare checksums.

    Are you sure you have the latest version [stonehenge.com]? There's code specifically in there to download the CHECKSUMS file to prevent exactly such an action:

        if ($path =~ m{^authors/id}) { # maybe fetch CHECKSUMS
          my $checksum_path =
            URI->new_abs("CHECKSUMS", $remote_uri)->rel($REMOTE);
          if ($path ne $checksum

    --
    • Randal L. Schwartz
    • Stonehenge
  • I just noticed that the Perlmonks [perlmonks.org] version is the buggy preliminary version. Please use the final version [stonehenge.com] instead.
    --
    • Randal L. Schwartz
    • Stonehenge
    • Thank you! Turns out my solution didn't work, anyway. It went downloaded all the CHECKSUMS files ... then deleted them!

      --
      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
      • It went downloaded all the CHECKSUMS files ... then deleted them!

        Heh! That's exactly what the very next version did for me.

        At least you were on the right track. Another 42 minutes or so, and you'd have ended up with my final version.

        The key was not running mirror needlessly. I ended up with a multi-stage algorithm, described in the accompanying text. The result is that I don't try to mirror any CHECKSUMS for which I already have a local version and none of its associated files have been updated

        --
        • Randal L. Schwartz
        • Stonehenge