use Perl Log In
The Zen of Comprehensive Archive Networks
I'll start negatively and end with hopefully more constructive notes, however these will build on the denials.
In the following Mumble and mumble stand for any other language than Perl or a combination of languages other than Perl.
First, the negative statements.
- CPAN shall not 'piggyback' other languages.
(There shall not be a mumble/ top level directory.)
- Rationale: CPAN is CPAN is CPAN. CPAN carries Perl. This implies all kinds of different contracts, explicit and implicit.
- Some people in the Mumble community will take offense to CPAN carrying Mumble.
- Some people in the Perl community will take offense to CPAN carrying Mumble.
- Some CPAN mirrors will take offense to suddenly having to carry also Mumble.
- Some CPAN mirrors will become resource (bandwidth, disk) constrained after having to suddenly carry also Mumble.
- CPAN cannot 'piggyback' other languages.
- The building blocks or 'plumbing' of CPAN (the basic directory structure, the PAUSE) is a reasonably good match for Perl. I'm not so certain that it is for all the other languages.
Now, on to the hopefully more constructive suggestions.
First and foremost-- I'm not against other language communities having a CPAN. I would love to have such archives. I'm willing to help the other language communities. I'm only against too straightforward "let's just slap it on to the CPAN" solutions to the problem. Other languages are not like Perl, they are different, to a smaller or larger degree. Let's allow them their own degree of dignity and careful thought.
Then on to the technical questions, a.k.a. "How did you do it?" Well,
people always ask that from me and I go speechless... "Errrr, ummm, I
kind of pulled all this stuff together and organized it a bit, and put
it on a ftp server". After this a brooding silence always falls...
"And...?"
Well, that's not really it, of course. The above is how CPAN started. How it grew is another story. First, Larry designed Perl to grow by letting it have modules (in other words, namespaces). Then we had a couple of wise men (like Tim Bunce) to have the vision of good module naming guidelines. Finally, we had Andreas König who single-handedly wrote PAUSE [2], the module submission machinery, where Perl module authors can register, submit, and manage their submissions. This allowed for a rapid but still controlled growth of modules. Because of the growth, it finally became too arduous to know what was out there, and luckily Graham Barr's scratch to this itch become large enough to be published as search.cpan.org [3]. Later backPAN [4] was added by Andreas to hold all the old versions of submissions deleted by their authors; this ties back into simple basic things that the master server(s) must have, like good backups. Last but not least, feedback for the modules can given through the RT ticketing system set up be Jesse Vincent.
[2] http://pause.cpan.org/ (or https://pause.cpan.org/) [3] http://search.cpan.org/ [4] http://history.perl.org/backpan/ [5] http://rt.cpan.org/CPAN mirrors [6], then? How did they come about? The original ones, dozen or so, were easy: I just asked the maintainers of the original ftp sites I had found the seeds of CPAN from whether they might be interested in carrying this slightly bigger amalgamated Perl archive. Well, they foolishly agreed... I have to remind people once again that CPAN was conceived as a FTP archive. Not a website. And it still is that way. search.cpan.org just gives a nice interface. I'm sorry but I'm a dry CS engineer, not a graphic designer. Information, not animation.
[6] http://mirrors.cpan.org/Oh, back to the CPAN mirrors. After the original ones, we grew slowly for a while, by word of mouth in the Perl community. However, since this was the time before the billions dollars worth fiber dug into the ground, Internet connections were still a bit dodgy and spotty. Therefore I started doing two things: scanning ftp logs for sites that obviously were mirroring CPAN but were not registered mirrors, and sites that were good representatives for their particular top level domain, especially outside the big seven TLD. This way I could track down where Perl was used and by asking those sites to participate to push back the load from the master site. Later I also filled in missing countries by going for sites like the sunsites, and other vendor/public funded sites that had a good chance of having good connectivity. Usually I could find a sympathetic soul, oftentimes a system administrator.
Summary of the mirror tirade: I went for sites that liked and/or used Perl. I have no way of knowing off-hand whether they would like Mumble. The mirrors are donating their network and storage capacity and some amount of their administrative time for the Perl community. If we would like to extend that in any way we would have to ask them, from all of them individually.
You can learn more about CPAN's history from the Perl timeline [7]. Things didn't happen overnight.
[7] http://history.perl.org/PerlTimeline.htmlA quite important thing for both the authors and the users is that the language must get the naming scheme of its modules right, or at least reasonably close. Perl's/CPAN's is far from perfect, but at least it was once designed, and it has been enhanced over the years as new needs have appeared. A good naming scheme allows hierarchical browsing, gives good hints for search engines (a good name is effectively a string of uniquely identifying keywords), and coordinates community efforts. Some sort of conflict resolution mechanism in case of competing and identically named implementations is important. Keeping all those guidelines well documented and all these processes public is important. One naming issue I think Perl 5 got wrong is that module namespaces are first-come-first-served, two or more different authors cannot have an identically named module. This may lead into unintentional or intentional squatting, which is not good for the community.
When designing your author/module/whatever hierarchy think scalability. We originally got it wrong by having all authors as subdirectories in one single directory which quickly became a bottleneck. (The solution to this was simply to 'hash' based on the leading two characters of the user ids.) Think also several different views to your data: by author, by module, by category, by date, by keywords, and so forth. Don't think only hierarchical views will be enough: you will need searching capabilities.
Get your license policy clear from the day one. No, day minus one. In this day and age it is very important that every piece of software gets clearly marked as to what license it carries. Build your module packaging tools so that they suggest, maybe even demand that the author picks a license. This way both the users of modules and distributors of software wanting to include the module don't have to keep guessing.
Very much related to the licensing is of course commercial use: CPAN took the easy and clear policy of no commercial software of any kind, not even share/guilt/donateware would be allowed. We felt that any other policy would be open to nitpicking, or maybe even legal challenges, and as a volunteer ragtag group we had no time or other resources for any of such.
Security? Should you have PGP keys and triply-written-in-blood signatures? Maybe. Currently CPAN has only MD5 checksums-- but so far they have been enough. There are some ongoing projects that enable using PGP keys for verifying the origin of the software; but as always with PKI systems, bootstrapping the web of trust is hard, some say even not worth the trouble.
Code quality? Ratings/reviews? Moderation/metamoderation? "Approved" SDKs? These all are hotly debated subjects and will not be addressed here since the CPAN is and will stay an open and free forum, where the authors decide what they upload. Any further selection belongs to different fora.
The scripts that maintain the CPAN are dreadfully simple. They are
just simple shell scripts that copy sites A, B,
Andreas has the webserver code for PAUSE available online. That code is slightly more complex than the core CPAN scripts, or the scripts supporting the PAUSE; but even here, the code is there. No tricks up our sleeves.
There is no magic. All it takes is a few people that sit down and get first something running, a rough cut. Then iteratively enhance it. Perhaps the most demanding thing is commitment: someone must keep things running. A slowly decaying and dusty archive is almost worse (and certainly more sad) than no archive at all.
Oook and out.
--Jarkko Hietaniemi, the CPAN Master Librarian

CPAN.pm (Score:1)
I honestly feel that without CPAN.pm, CPAN as a whole would not be as popular as it is today. Look at something like HTML::Mason, which has a half dozen dependencies. CPAN.pm Just Takes Care Of It[tm].
-Dom
P.S. I know that CPAN.pm has many flaws, but it makes up for them by 1) being useful and 2) being installed by default with perl.
Re:CPAN.pm (Score:2)
I guess I'll start maintaining the master copy of
this article somewhere at CPAN, once the feedback settles down.
Re:CPAN.pm (Score:1)
It's not been that long since I managed systems where CPAN.pm was completely unusable, and I had to do it all by hand--FTP, make, and the like. No automatic dependency checking, no fetching, no module lists, nothing. (Plus it was five miles uphill in the snow to the nearest mirror!) The only thing available was the base mirror functionality. And with that... CPAN was phe
Re:CPAN.pm (Score:2)
There may be some modules that don't install cleanly, or have strange external dependancies that they don't make clear
Re:CPAN.pm (Score:1)
Re:CPAN.pm (Score:2)
Actually, I'd say the one thing that really 'made' CPAN was search.cpan....once it caught on it made CPAN accessible to a much wider audience which is why it is so often confused for the archive itself. CPAN was a success just by existing at a time when you had to ftp to 15 different sites just to get the kit you wanted for your systems. CPAN.pm made it convenient and search.cpan made it navigable and less intimidating for those a lot less familiar with CPAN. WAIT and UWinnipeg had been around for at least
Re:CPAN.pm (Score:2)
MakeMaker (Score:1)
-Dom
It's easier than you think! (Score:1)
Namespaces (Score:1)
Mind you, I don't want to go down the route of SGML catalog files. That's too much like hard work.
-Dom
Re:Namespaces (Score:2)
I think the problem with using domain names is using domain names... that is, you make an implicit assumption that everybody
Other way of putting it is that using domainnames works okay-ish for stabl-ish organizati
Easier than you think (easier to read)! (Score:1)
When I was working for ActiveState [activestate.com], I got to observe other language communities try (and try, and try) to duplicate CPAN.
They failed with depressing regularlity by making it overcomplicated, or centralizing the work too much.
Decentralize!
If you want a community-based system, make the community do as much of the work as possible. No bottlenecks. The one centralized thing in PAUSE->CPAN is a mailing list which approves some changes in the naming hierarchy. This usually works ok but even now some peo
Re:Easier than you think (easier to read)! (Score:2)
RT Ticketing (Score:1)
Pity the search.cpan.org web interface changed about a week after RT was released. Still the blue and white is nostalgic.
One thing though - looks like ther SSL certificate for rt.cpan.org has expired on November 7th.
Gavin
Re:RT Ticketing (Score:1)
Source Code (Score:2)
I'd love to have a look at the source code for PAUSE and the other systems. Is it on the CPAN somewhere, or somewhere else online?
There's also CTAN (Score:1)
The Comprehensive TeX Archive Network [ctan.org]. For some reason I've always thought that CTAN preceeded CPAN, but I'm not really sure which one was there first. Like CPAN, CTAN was conceived as a FTP-based service and then the web came and people moved on and you know the rest. Since I use both CTAN and CPAN on a regular basis, sometimes I find myself wishing CTAN to be more CPAN-like. The CTAN Catalog is superb, but I think the killer CPAN feature is the ability to browse the documentation in a nice easy to rea
Re:There's also CTAN (Score:2)
CPAN is "only" seven years old, while CTAN is, gee, older than that. I can't off-hand find out how old CTAN is.
Re:There's also CTAN (Score:1)
CTAN was an effort to bring together the separating ftp servers with TeX material. I'm proud to say that it was triggered by a podium discussion I organized at the EuroTeX conference 1991, in Paris. George came up with the name CTAN, I think I have his email still somewhere in my archives. I got involved since I ran one o
Re:There's also CTAN (Score:2)
(This history somewhere in the CTAN website would be neat.)
> CTAN was an effort to bring together the separating ftp servers with TeX material.
Sounds so very familiar...
> and had heavily modified mirror.pl from Lee for this purpose.
If you CTAN guys would have any comments and/or suggestions to give for the "ZCAN" article I would be more than happy to incorporate them.
Re:There's also CTAN (Score:1)
Maybe we'll find sometime the volunteers to transport this effort back to CTAN.
Actually, there's a lot in CPAN we'd like to have in CTAN as well, and never got around it. Most important, something similar to PAUSE, and commonly agreed upon package structures.
Sigh, so much to do, so few tim