use Perl Log In
The Zen of Comprehensive Archive Networks
I'll start negatively and end with hopefully more constructive notes, however these will build on the denials.
In the following Mumble and mumble stand for any other language than Perl or a combination of languages other than Perl.
First, the negative statements.
- CPAN shall not 'piggyback' other languages.
(There shall not be a mumble/ top level directory.)
- Rationale: CPAN is CPAN is CPAN. CPAN carries Perl. This implies all kinds of different contracts, explicit and implicit.
- Some people in the Mumble community will take offense to CPAN carrying Mumble.
- Some people in the Perl community will take offense to CPAN carrying Mumble.
- Some CPAN mirrors will take offense to suddenly having to carry also Mumble.
- Some CPAN mirrors will become resource (bandwidth, disk) constrained after having to suddenly carry also Mumble.
- CPAN cannot 'piggyback' other languages.
- The building blocks or 'plumbing' of CPAN (the basic directory structure, the PAUSE) is a reasonably good match for Perl. I'm not so certain that it is for all the other languages.
Now, on to the hopefully more constructive suggestions.
First and foremost-- I'm not against other language communities having a CPAN. I would love to have such archives. I'm willing to help the other language communities. I'm only against too straightforward "let's just slap it on to the CPAN" solutions to the problem. Other languages are not like Perl, they are different, to a smaller or larger degree. Let's allow them their own degree of dignity and careful thought.
Then on to the technical questions, a.k.a. "How did you do it?" Well,
people always ask that from me and I go speechless... "Errrr, ummm, I
kind of pulled all this stuff together and organized it a bit, and put
it on a ftp server". After this a brooding silence always falls...
Well, that's not really it, of course. The above is how CPAN started. How it grew is another story. First, Larry designed Perl to grow by letting it have modules (in other words, namespaces). Then we had a couple of wise men (like Tim Bunce) to have the vision of good module naming guidelines. Finally, we had Andreas König who single-handedly wrote PAUSE , the module submission machinery, where Perl module authors can register, submit, and manage their submissions. This allowed for a rapid but still controlled growth of modules. Because of the growth, it finally became too arduous to know what was out there, and luckily Graham Barr's scratch to this itch become large enough to be published as search.cpan.org . Later backPAN  was added by Andreas to hold all the old versions of submissions deleted by their authors; this ties back into simple basic things that the master server(s) must have, like good backups. Last but not least, feedback for the modules can given through the RT ticketing system set up be Jesse Vincent. http://pause.cpan.org/ (or https://pause.cpan.org/)  http://search.cpan.org/  http://history.perl.org/backpan/  http://rt.cpan.org/
CPAN mirrors , then? How did they come about? The original ones, dozen or so, were easy: I just asked the maintainers of the original ftp sites I had found the seeds of CPAN from whether they might be interested in carrying this slightly bigger amalgamated Perl archive. Well, they foolishly agreed... I have to remind people once again that CPAN was conceived as a FTP archive. Not a website. And it still is that way. search.cpan.org just gives a nice interface. I'm sorry but I'm a dry CS engineer, not a graphic designer. Information, not animation. http://mirrors.cpan.org/
Oh, back to the CPAN mirrors. After the original ones, we grew slowly for a while, by word of mouth in the Perl community. However, since this was the time before the billions dollars worth fiber dug into the ground, Internet connections were still a bit dodgy and spotty. Therefore I started doing two things: scanning ftp logs for sites that obviously were mirroring CPAN but were not registered mirrors, and sites that were good representatives for their particular top level domain, especially outside the big seven TLD. This way I could track down where Perl was used and by asking those sites to participate to push back the load from the master site. Later I also filled in missing countries by going for sites like the sunsites, and other vendor/public funded sites that had a good chance of having good connectivity. Usually I could find a sympathetic soul, oftentimes a system administrator.
Summary of the mirror tirade: I went for sites that liked and/or used Perl. I have no way of knowing off-hand whether they would like Mumble. The mirrors are donating their network and storage capacity and some amount of their administrative time for the Perl community. If we would like to extend that in any way we would have to ask them, from all of them individually.
You can learn more about CPAN's history from the Perl timeline . Things didn't happen overnight. http://history.perl.org/PerlTimeline.html
A quite important thing for both the authors and the users is that the language must get the naming scheme of its modules right, or at least reasonably close. Perl's/CPAN's is far from perfect, but at least it was once designed, and it has been enhanced over the years as new needs have appeared. A good naming scheme allows hierarchical browsing, gives good hints for search engines (a good name is effectively a string of uniquely identifying keywords), and coordinates community efforts. Some sort of conflict resolution mechanism in case of competing and identically named implementations is important. Keeping all those guidelines well documented and all these processes public is important. One naming issue I think Perl 5 got wrong is that module namespaces are first-come-first-served, two or more different authors cannot have an identically named module. This may lead into unintentional or intentional squatting, which is not good for the community.
When designing your author/module/whatever hierarchy think scalability. We originally got it wrong by having all authors as subdirectories in one single directory which quickly became a bottleneck. (The solution to this was simply to 'hash' based on the leading two characters of the user ids.) Think also several different views to your data: by author, by module, by category, by date, by keywords, and so forth. Don't think only hierarchical views will be enough: you will need searching capabilities.
Get your license policy clear from the day one. No, day minus one. In this day and age it is very important that every piece of software gets clearly marked as to what license it carries. Build your module packaging tools so that they suggest, maybe even demand that the author picks a license. This way both the users of modules and distributors of software wanting to include the module don't have to keep guessing.
Very much related to the licensing is of course commercial use: CPAN took the easy and clear policy of no commercial software of any kind, not even share/guilt/donateware would be allowed. We felt that any other policy would be open to nitpicking, or maybe even legal challenges, and as a volunteer ragtag group we had no time or other resources for any of such.
Security? Should you have PGP keys and triply-written-in-blood signatures? Maybe. Currently CPAN has only MD5 checksums-- but so far they have been enough. There are some ongoing projects that enable using PGP keys for verifying the origin of the software; but as always with PKI systems, bootstrapping the web of trust is hard, some say even not worth the trouble.
Code quality? Ratings/reviews? Moderation/metamoderation? "Approved" SDKs? These all are hotly debated subjects and will not be addressed here since the CPAN is and will stay an open and free forum, where the authors decide what they upload. Any further selection belongs to different fora.
The scripts that maintain the CPAN are dreadfully simple. They are
just simple shell scripts that copy sites A, B,
Andreas has the webserver code for PAUSE available online. That code is slightly more complex than the core CPAN scripts, or the scripts supporting the PAUSE; but even here, the code is there. No tricks up our sleeves.
There is no magic. All it takes is a few people that sit down and get first something running, a rough cut. Then iteratively enhance it. Perhaps the most demanding thing is commitment: someone must keep things running. A slowly decaying and dusty archive is almost worse (and certainly more sad) than no archive at all.
Oook and out.--
Jarkko Hietaniemi, the CPAN Master Librarian