Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

bart (450)

Journal of bart (450)

Thursday January 08, 2009
05:29 PM

CPAN like it's 1995

[ #38242 ]

(title inspired by a blog post)

CPAN.pm used to ask a bunch of questions the first time it is run. One of the questions is what CPAN mirrors to use.

Now it doesn't any more: it comes preconfigured. But that comes at a price: a lot of distros simply assume use of http://cpan.perl.org/ or of http://www.cpan.org/ (while several perl ports use their own private CPAN mirror by default, such as Strawberry Perl for Windows, and, I thought, ActiveState's ActivePerl.

Is the idea of using CPAN mirrors simply outdated? Or, should the CPAN client be smarter, and figure out for itself which mirrors to use? The latter feels like overkill to me. It presumes inclusion of a geolocator module and database, like Geo::IP (the free version of that database is far more than sufficient for this purpose, so the license price is no objection). But having that module and database on every Perl installation, just to get a list of mirrors once, or maybe a few times, in the lifetime of a perl installation, really is far too much.

I can remember how http://www.perl.com/CPAN, thanks to Tom Christiansen IIRC, used to have a built in redirector, where it figured out where in the world you are, and hence, which (single) mirror to use. But if that one mirror was offline, you were out of luck. It didn't check the status of the mirror, it just redirected you there.

If we still wish to use mirrors, why not drag CPAN into the age of webservices? (actually we're already late for that, as the age of webservices seems to have passed, already... :)) Set up a main page on a site, for example on www.cpan.org, where CPAN.pm can simply ask "Can you suggest me what mirrors to use?" (pun intended). Then only the central site needs to have this geolocation database, to check what part of the world the request comes from, and compose a list of preferable mirrors. The output could be as simple as a text/plain page with one URL of a mirror per line, returning maybe 5 or 10 URLs in total. Easy to generate, and dead easy to parse.

(Note: the order of mirrors that are close to each other in level of preference could be randomly shuffled for each request, to avoid that all users in one area all hammer the same mirror.)

CPAN.pm can still be made a bit smarter, and for example, use ping to test responsiveness of the mirror, or, simpler still, time the fetch time of a page from the currently chosen mirror, and check if it's fast enough (depending on your internet connection; it should keep track of responsiveness of the mirrors, so it can compare them); and switch the order of mirrors, if that may, likely, seriously improve matters.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • So when do you think you can have a proof of concept written? :D
  • Why do you think this is a problem?

    Letting CPAN.pm configure itself with sensible defaults is a good thing - it creates a better user experience, and if for whatever reason you want to use a specific mirror it can easily be set using the "o conf urllist" command.

    • CPAN appears to take real pride in the fact that it has a huge network of 222 mirrors. Is that pride justified? Or is having such a network of mirrors just an outdated (1995), almost ridiculous concept?

      • I doubt whether anybody still modifies the default settings for CPAN.pm, once it works. I know I don't. That means that currently maybe say 70% of all installations use the same 3 or 4 repository servers, and that percentage can only just go up.

        Do we really have to maintain the mirror network? Or can we thi

  • The output could be as simple as a text/plain page with one URL of a mirror per line

    Did you mean: text/uri-list [ietf.org]

  • CPAN.pm already writes some statistics about downloaded files into FTPstats.yml. This can be used to calculate the download speed and maybe re-configure the urllist in CPAN/Config.pm. For a quick start:

            perl -MYAML::Syck=LoadFile -e '$x=LoadFile shift; for (@{$x->{history}}) { warn ((-s "sources/$_->{file}") / ($_->{end}-$_->{start})) }' FTPstats.yml