Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Tuesday October 08, 2002
07:37 AM

Psyching out SourceForge

[ #8242 ]

I use SourceForge.

Somehow that sounds like something I should say at some 12 Step meeting. I know some other people in the Perl community use SourceForge, and probably barely tolerate it like I do.

Or like I did.

I still use SourceForge, but only vicariously. I finished a Perl script to interact with it for me.
I only really need the web interface to release files. SourceForge keeps a historical archive of all of my releases, and CPAN keeps the latest versions. Now that CPAN is 1.3 Gb, I like to keep my directory small. The Schwartz Factor is a low 0.17 this week, meaning that about 5/6ths of CPAN is either old versions or perl itself. No sane person mirrors SourceForge or wants to put it on a CD, so I keep a huge archive there.

The problem is PHP. PHP is almost always a problem because it mixes presentation with infrastructure. If SourceForge had a solid backend that did not depend on a web server, people could easily hack up SOAP, XML-RPC, or other sorts of interfaces to automate most of the mundane tasks (like releasing files). Instead, everything is tied up in tortuous PHP code that is at best really half implemented in spots. The error message function, for instance, almost always uses the default error message which makes it no help at all when I am trying to debug my SourceForge user-agent.

For the past couple of weeks I have been trying to automate as much of my module release process as I can. The less I do myself the fewer mistakes I make, the less I forget to do, the fewer messages I get from CPAN Testers, and the more everyone else wins. Automating testing and PAUSE were easy.

Automating SourceForge was hard. I do not think anyone has done it before, or if they have, I could not find anyone talking about it. The SourceForge people have half-heartedly promised XML-RPC interfaces before, but since they are locked into PHP they would have to reimplement everything to get that to work, and then they would have two development tracks. Any technology that sacrifices flexibility should be taken outside and shot. Remember why the web is so popular---anything can be behind those URLs---Perl, Java, Python, C. If I change the backend, the URL stays the same and the user does not have to know it. Why is mod_perl so easy to use right away? Apache::Registry simply takes over the CGI scripts. Not so with PHP. Almost every PHP system I have seen locks the user into a particular way of doing things and once there never lets them escape. It is easy to get cool HTML effects right away, and it is easy to learn, but it is horribly limited and limiting.

No matter what I think, though, SourceForge still uses PHP so I just have to deal with it. I like most of SourceForge, but what I really need now is something to do the monkey work for me. I do not want to spend my time pointing and clicking when I can write a program to make those decisions for me.

SourceForge uses its own login system (PAUSE uses Basic authentication), so I have to track cookies. This is rather painful because I have to attach the cookies to each LWP request, and extract them from each response. Several times in the programming process I forgot to do one or the other in the five page process and had to carefully analyze HTTP sniffer logs to figure out what I messed up. SourceForge also does a redirection once I log in, and although WWW::Mechanize does not work in this situation (it needs about five more lines of code that Andy will now probably make me write :), I got the tiny bit of insight to crack the problem by using it. These are the sorts of things that happen when the user interface is also the backend.

SourceForge also assigns all sorts of tricky numbers to various things. Each project has a number, and each package in the project has a number, and so on. I had to wade through a lot of HTML to discover those numbers for each of my projects. Now I have to communicate those numbers to my release script on a per distribution basis, so that means each distribution gets a new, hidden configuration file to store that stuff.

In all, the code to automate a file release to SourceForge turns out to be about 200 lines of Perl---a huge number. But that represents two forms and five complicated HTTP requests with plenty of print statements for debugging.

Want to see the script for yourself? It is on SourceForge. :)