use Perl Log In
The Zen of Comprehensive Archive Networks
I'll start negatively and end with hopefully more constructive notes, however these will build on the denials.
In the following Mumble and mumble stand for any other language than Perl or a combination of languages other than Perl.
First, the negative statements.
- CPAN shall not 'piggyback' other languages.
(There shall not be a mumble/ top level directory.)
- Rationale: CPAN is CPAN is CPAN. CPAN carries Perl. This implies all kinds of different contracts, explicit and implicit.
- Some people in the Mumble community will take offense to CPAN carrying Mumble.
- Some people in the Perl community will take offense to CPAN carrying Mumble.
- Some CPAN mirrors will take offense to suddenly having to carry also Mumble.
- Some CPAN mirrors will become resource (bandwidth, disk) constrained after having to suddenly carry also Mumble.
- CPAN cannot 'piggyback' other languages.
- The building blocks or 'plumbing' of CPAN (the basic directory structure, the PAUSE) is a reasonably good match for Perl. I'm not so certain that it is for all the other languages.
Now, on to the hopefully more constructive suggestions.
First and foremost-- I'm not against other language communities having a CPAN. I would love to have such archives. I'm willing to help the other language communities. I'm only against too straightforward "let's just slap it on to the CPAN" solutions to the problem. Other languages are not like Perl, they are different, to a smaller or larger degree. Let's allow them their own degree of dignity and careful thought.
Then on to the technical questions, a.k.a. "How did you do it?" Well,
people always ask that from me and I go speechless... "Errrr, ummm, I
kind of pulled all this stuff together and organized it a bit, and put
it on a ftp server". After this a brooding silence always falls...
"And...?"
Well, that's not really it, of course. The above is how CPAN started. How it grew is another story. First, Larry designed Perl to grow by letting it have modules (in other words, namespaces). Then we had a couple of wise men (like Tim Bunce) to have the vision of good module naming guidelines. Finally, we had Andreas König who single-handedly wrote PAUSE [2], the module submission machinery, where Perl module authors can register, submit, and manage their submissions. This allowed for a rapid but still controlled growth of modules. Because of the growth, it finally became too arduous to know what was out there, and luckily Graham Barr's scratch to this itch become large enough to be published as search.cpan.org [3]. Later backPAN [4] was added by Andreas to hold all the old versions of submissions deleted by their authors; this ties back into simple basic things that the master server(s) must have, like good backups. Last but not least, feedback for the modules can given through the RT ticketing system set up be Jesse Vincent.
[2] http://pause.cpan.org/ (or https://pause.cpan.org/) [3] http://search.cpan.org/ [4] http://history.perl.org/backpan/ [5] http://rt.cpan.org/CPAN mirrors [6], then? How did they come about? The original ones, dozen or so, were easy: I just asked the maintainers of the original ftp sites I had found the seeds of CPAN from whether they might be interested in carrying this slightly bigger amalgamated Perl archive. Well, they foolishly agreed... I have to remind people once again that CPAN was conceived as a FTP archive. Not a website. And it still is that way. search.cpan.org just gives a nice interface. I'm sorry but I'm a dry CS engineer, not a graphic designer. Information, not animation.
[6] http://mirrors.cpan.org/Oh, back to the CPAN mirrors. After the original ones, we grew slowly for a while, by word of mouth in the Perl community. However, since this was the time before the billions dollars worth fiber dug into the ground, Internet connections were still a bit dodgy and spotty. Therefore I started doing two things: scanning ftp logs for sites that obviously were mirroring CPAN but were not registered mirrors, and sites that were good representatives for their particular top level domain, especially outside the big seven TLD. This way I could track down where Perl was used and by asking those sites to participate to push back the load from the master site. Later I also filled in missing countries by going for sites like the sunsites, and other vendor/public funded sites that had a good chance of having good connectivity. Usually I could find a sympathetic soul, oftentimes a system administrator.
Summary of the mirror tirade: I went for sites that liked and/or used Perl. I have no way of knowing off-hand whether they would like Mumble. The mirrors are donating their network and storage capacity and some amount of their administrative time for the Perl community. If we would like to extend that in any way we would have to ask them, from all of them individually.
You can learn more about CPAN's history from the Perl timeline [7]. Things didn't happen overnight.
[7] http://history.perl.org/PerlTimeline.htmlA quite important thing for both the authors and the users is that the language must get the naming scheme of its modules right, or at least reasonably close. Perl's/CPAN's is far from perfect, but at least it was once designed, and it has been enhanced over the years as new needs have appeared. A good naming scheme allows hierarchical browsing, gives good hints for search engines (a good name is effectively a string of uniquely identifying keywords), and coordinates community efforts. Some sort of conflict resolution mechanism in case of competing and identically named implementations is important. Keeping all those guidelines well documented and all these processes public is important. One naming issue I think Perl 5 got wrong is that module namespaces are first-come-first-served, two or more different authors cannot have an identically named module. This may lead into unintentional or intentional squatting, which is not good for the community.
When designing your author/module/whatever hierarchy think scalability. We originally got it wrong by having all authors as subdirectories in one single directory which quickly became a bottleneck. (The solution to this was simply to 'hash' based on the leading two characters of the user ids.) Think also several different views to your data: by author, by module, by category, by date, by keywords, and so forth. Don't think only hierarchical views will be enough: you will need searching capabilities.
Get your license policy clear from the day one. No, day minus one. In this day and age it is very important that every piece of software gets clearly marked as to what license it carries. Build your module packaging tools so that they suggest, maybe even demand that the author picks a license. This way both the users of modules and distributors of software wanting to include the module don't have to keep guessing.
Very much related to the licensing is of course commercial use: CPAN took the easy and clear policy of no commercial software of any kind, not even share/guilt/donateware would be allowed. We felt that any other policy would be open to nitpicking, or maybe even legal challenges, and as a volunteer ragtag group we had no time or other resources for any of such.
Security? Should you have PGP keys and triply-written-in-blood signatures? Maybe. Currently CPAN has only MD5 checksums-- but so far they have been enough. There are some ongoing projects that enable using PGP keys for verifying the origin of the software; but as always with PKI systems, bootstrapping the web of trust is hard, some say even not worth the trouble.
Code quality? Ratings/reviews? Moderation/metamoderation? "Approved" SDKs? These all are hotly debated subjects and will not be addressed here since the CPAN is and will stay an open and free forum, where the authors decide what they upload. Any further selection belongs to different fora.
The scripts that maintain the CPAN are dreadfully simple. They are
just simple shell scripts that copy sites A, B,
Andreas has the webserver code for PAUSE available online. That code is slightly more complex than the core CPAN scripts, or the scripts supporting the PAUSE; but even here, the code is there. No tricks up our sleeves.
There is no magic. All it takes is a few people that sit down and get first something running, a rough cut. Then iteratively enhance it. Perhaps the most demanding thing is commitment: someone must keep things running. A slowly decaying and dusty archive is almost worse (and certainly more sad) than no archive at all.
Oook and out.
--Jarkko Hietaniemi, the CPAN Master Librarian

CPAN.pm
(Score:1)( http://happygiraffe.net/blog/ | Last Journal: 2004.12.07 20:57 )
I honestly feel that without CPAN.pm, CPAN as a whole would not be as popular as it is today. Look at something like HTML::Mason, which has a half dozen dependencies. CPAN.pm Just Takes Care Of It[tm].
-Dom
P.S. I know that CPAN.pm has many flaws, but it makes up for them by 1) being useful and 2) being installed by default with perl.
It's easier than you think!
(Score:1)CPAN.pmor Search.CPAN.org [cpan.org] or ActiveState'sppmpossible. But keep it extensible! There are a lot of things we *ought* to have standards for in perl modules, but my goal here is to convince you to keep it simple. So rather than mention them, I'll just advise you to keep a master file for each module, in some format that allows you to add extra fields later.Easier than you think (easier to read)!
(Score:1)When I was working for ActiveState [activestate.com], I got to observe other language communities try (and try, and try) to duplicate CPAN.
They failed with depressing regularlity by making it overcomplicated, or centralizing the work too much.
Decentralize!
If you want a community-based system, make the community do as much of the work as possible. No bottlenecks. The one centralized thing in PAUSE->CPAN is a mailing list which approves some changes in the naming hierarchy. This usually works ok but even now some people are frustrated with it.
The proposal for Perl 6's CPAN is that authors should be allowed to write modules with the same name. Joe's "foo.mumble" shouldn't pre-empt Bill's "foo.mumble". This sounds frightening, but I've thought about it a lot and it actually isn't.
Keep it simple!
The Python folks wanted to make an Zope-based archive that was maintained by experts who precompiled modules for various platforms. So there would be no compilation or building step for the users. Installation would happen with web services trickery and so on.
These experts would also exercise their judgment about which modules were good, which modules to approve upgrades for, etc.
Too complicated! And too centralized!
Your first goal is to make it easy to submit code and redistribute it. Ease of use and quality control are not the central problems you are trying to solve here.
If your solution necessarily involves databases or web servers, I respectfully suggest you are making it too difficult. You can distribute CPAN on a CD-ROM.
Standardize!
But, (you say) I really want my language's archive to surpass the ease of use of Perl's CPAN.
Here's how: build a stable, simple base that the rest of the community can write hooks for.
Standardize on ways of installing dependencies, installing the module, and testing it. And finally, there should be a standard way of obtaining the documentation from the module.
Perl has imperfect, but widely-adopted standards for all of these, and that is what makes tools like
CPAN.pmor Search.CPAN.org [cpan.org] or KobeSearch [cpan.org] possible.But keep it extensible!
There are a lot of things we *ought* to have standards for in perl modules, but my goal here is to convince you to keep it simple.
So rather than mention them, I'll just advise you to keep a master file for each module, in some format that allows you to add extra fields later.
final note:
I helped write the current generation of ppm, ActiveState's tool to precompile modules server-side and make it easy to install them client-side.
Some people believe that it should be that easy, from day one, in their languages.
But! That tool is only a going concern only since there were a lot of standardized modules to begin with, which made it worth ActiveState's time to devote the extra effort write a layer on top of CPAN.
Obviously, one shouldn't have to rely on a for-profit company to write a tool like ppm. But the cost-benefit problem is the same. Make it easy for there to be multiple front-ends!
RT Ticketing
(Score:1)( http://brock-family.org/gavin/ )
Pity the search.cpan.org web interface changed about a week after RT was released. Still the blue and white is nostalgic.
One thing though - looks like ther SSL certificate for rt.cpan.org has expired on November 7th.
Gavin
Source Code
(Score:2)( http://2shortplanks.com/ | Last Journal: 2004.10.07 7:32 )
I'd love to have a look at the source code for PAUSE and the other systems. Is it on the CPAN somewhere, or somewhere else online?
There's also CTAN
(Score:1)The Comprehensive TeX Archive Network [ctan.org]. For some reason I've always thought that CTAN preceeded CPAN, but I'm not really sure which one was there first. Like CPAN, CTAN was conceived as a FTP-based service and then the web came and people moved on and you know the rest. Since I use both CTAN and CPAN on a regular basis, sometimes I find myself wishing CTAN to be more CPAN-like. The CTAN Catalog is superb, but I think the killer CPAN feature is the ability to browse the documentation in a nice easy to read format. (La)TeX packages have great documentation, but you sort of requiere a DVI or PS viewer, which aren't really documentation browsing tools (cut and pasting code is hard or impossible).
Thank you for bringing CPAN to life, it's just wonderful.