Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Monday September 15, 2008
11:57 PM

Index of The Perl Journal articles on Dr. Dobbs Online

I put together an index of the TPJ articles that appear on the Dr. Dobbs site. It covers from November 2003 to January 2006. Apparently I was the last person to publish an article as part of TPJ.

01:06 PM

PDFs of CMP-era The Perl Journals?

CMP published The Perl Journal as PDFs for a couple of years, around 2004ish. I've lost my collection of those. Does anyone have them?

I was looking for them when I was answering a question about finding old TPJ articles at Stack Overflow.

01:03 PM

Stack Overflow

Stack Overflow, the programmer's question site that Joel Spolsky helped design this summer, is now public. It looks like a cross between reddit and Perlmonks.

It does mix all of the questions together, so you'll see a VB question next to a PHP question next to a Perl question, but it also has tagging. To see the Perl questions, listed most recent first, just go to the the page for Perl tags.

It also looks like it might have the unintended consequence of becoming a Web 2.0 version of IRC: despite the FAQ saying that subjective questions (favorite X, etc) should not be asked, the subjective questions seem pretty popular. Along with that, I think the voting will go pear-shaped if they get the wrong community, which I already think is happening. It's actully depressing that someone can build a good tool or service and have it go awry for reasons totally out of their control. It's predictable even, but there's really not that much that you can do technologically to prevent it.

John Siracusa has been doing a good job, and the way to avoid disaster is to get more good people there. You just need an OpenID account.

12:36 AM

BackPAN Indexer has components and another interface

I've been doing more work on my BackPAN Indexer, but the sort that doesn't do any indexing. What I really need to do is be home for more than one day at a time so I can get everything set up on a computer that I don't have to use. Indexing tens of thousands of distributions takes over my MacBook's poor disk drive, and then I can't do much.

So, in the meantime, I'm cleaning up the structure of the code so I can make things pluggable. There are several components that you can plug in: A Queue class that makes the list of things to process, a Worker class that defines the work to do on each thing in the queue, a Reporter class to store the Worker's results, a Dispatcher to hand out work to the Workers, and an Interface to show the live run information.

It's turned out to be a really nice design (that's getting better). I know I have a good design when the things I want to do next naturally fall out of the design. For instance, I wanted to test the Tk Interface and move things around, but I didn't want to actually process anything. I started working on a test script to mock everything, then I realized I didn't really need mocks because I could plug in null classes to handle the Queue, which would be empty or not, and the Worker, which would just do nothing.

So, as I've been moving around the country, I've been working on coding at least two examples of each component. That's a really nice way to design things. A design that works for one thing might not work for another thing. Forcing myself to come up with two non-trivial examples shakes out some of that stuff. This stuff is going to be inportant if people want to run the indexer themselves, or if someone ever sets up a CPAN Testers style group to do it (i.e. the dispatcher can distribute work around the world).

This week's work has been to create a Curses interface (because if I can't run it in a terminal, it's not Scottish), and something I'm calling the Test Census, which just counts the Test:: modules instead of doing the full indexing (and storing the large amounts of data I collect). If you're looking in the git repo, check out the test_counter branch. There is something a bit broken with the dispatching or the reporting somehow, but I'll think about that next week.

I think the full index of BackPAN will start next week, and I expect it to run for about a week. I figure the error rate will be about 5%, like it was for MiniCPAN, and I'll then spend some time improving the indexer. While the indexer is running, I'll work on the other bits.

Sunday September 07, 2008
11:16 AM

I'm on YAPC TV: BackPAN Archeology

My YAPC::EU 2008 BackPAN Archeology talk is on YAPC.tv, where you can watch the high resolution or low resolution video, or even download the Flash or MPEG video.

You can't see the slides, but they are on SlideShare.

In the talk I mention that many things are in progress, although I just uploaded a demonstration video of it actually working on MiniCPAN. My previous post details that.

Saturday September 06, 2008
01:21 PM

Cataloging BackPAN: MiniCPAN done in 9 hours

My BackPAN indexer (YAPC::EU 2008 slides) made it's first complete pass through my MiniCPAN yesterday:

  • Distributions processed: 16039
  • Indexing failures: 782 (4.8%)
  • Run time: 9 hours (0.49 dists / sec)

The total size of BackPAN is about 100,000 distributions, so I think this means that I could index all of BackPAN in less than a week.

Right now I output everything as YAML, one YAML file per distribution. The data organization is sloppy and sometimes redundant because I haven't paid attention to it. You can get the tarball of all 16,000 files. Take a look to see if there might be anything else you'd want the indexer to record about a distribution. If you're interested in making some sort of CPAN service, let me do the work of cataloging the information you need.

If you want to play with this, get MyCPAN::Indexer from CPAN Search, or if you want to play with everything, checkout the sources from Github. You probably can't install in from CPAN since it depends on a couple of modules which only have developer releases right now.

The thing you'd want to play with is examples/backpan_indexer.pl. It's a little messy right now because I bolted on a Tk interface (see video one or two) that lives in examples/tk.pl and a dispatcher that lives in examples/steak.pl. My next step is to make those pluggable modules so you can note in the configuration file which interface and dispatcher you want, and as long as they have the right interface, they'll do whatever they do.

After a little bit more work on the indexing stuff, the next step is to take all of those YAML files and distill them into something that is easier to search, then hook up some sort of search interface to them. I'll probably first write a command-line tool (although with wonderful MVCness). I want to feed the index any file in @INC and get a report:

$ cpan_index `perldoc -l Foo`
Foo.pm's fingerprint found in Foo-Bar-0.05.tgz
    Author: Joe Snuffy (SNUFFY@cpan.org)
    Release date: Nov 11, 1998, 23:59:59
    Version: 0.05
    Latest version on CPAN: Foo-Bar-0.06.tgz
    Current maintainers:
        Joe Snuffy (SNUFFY@cpan.org)  (first come)
        Joe Cool (CAMEL@cpan.org)     (co-maintainer)
    Also came with:
        !!!Bar.pm, installed version 0.08 (does not match Bar.pm from Foo-Bar-0.05.tgz)
        ABC.pm, installed version 0.05 (matches ABC.pm in Foo-Bar-0.05.tgz)
    Depends on:
        Baz.pm from Baz-0.67.tgz
        Quux.pm from Quux-0.01.tgz
    CPAN Testers Matrix: ...
    Release history:
        0.01  Dec 31, 1969, 23:59:59  SNUFFY  (BackPAN)
        0.02  Jan 31, 1995, 23:59:59  SNUFFY  (BackPAN)
        0.03  Jun 6,  1996, 23:59:59  SNUFFY  (BackPAN)
        0.04  Oct 31, 1997, 23:59:59  SNUFFY  (BackPAN)
    ****0.05  Nov 11, 1998, 23:59:59  SNUFFY  (CPAN)
        0.06  Sep  5, 2008, 23:59:59  CAMEL   (CPAN)

Tuesday September 02, 2008
05:54 PM

Don't want SourceForge email? Delete your account.

So, I've now stopped using Sourceforge completely, but I'm not deleting anything from there. Since I get the SourceForge emails, I figured I'd now opt out. I never read them anyway.

This message was sent on behalf of SourceForge.net based on
the existence of your user account on our site.
 
To unsubscribe from future mailings, login to the SourceForge.net site
and request account removal at:
http://sourceforge.net/account/remove_account.php

Yep, their opt-out policy is to delete your account and remove you from all projects. I think I might have unsubscribed in my "Account Options" summary, but with SourceForge you can never tell where the right thing is.

So, what's the rumor around the campfire? Is SourceForge looking to cut down on users?

01:21 PM

Automatic license detection in Perl distributions

Before I reinvent the wheel..

So, given an arbitrary Perl distribution (anything in BackPAN), does anyone have code to guess the license for the code? I know that I can look in META.yml if someone typed the right things somewhere, but that's not enough. Many distributions aren't well-formed, and there are a lot of different ways to license code.

And, does anyone know of Perl distributions that contains files that have different licenses so that that distribution isn't covered by a single license? File A is under License Foo but File B is under Licence Quux, or something like that.

Friday August 29, 2008
12:48 PM

The Larry Wall Baseball Card

At YAPC::EU, Salve was handing out baseball cards of Larry Wall to advertise NPW 2009 in Oslo. If you didn't get to see them, now you can:

Front

Back

11:10 AM

brian's Guide now in Italian, along with other things

The people over at Perl.it translated my Guide to Solving Any Perl Problem into Italian.

In case you missed bepi's latest announcement, they are actually translating a lot more than just that. Install POD2::IT to get the perldocs in Italian.

% perldoc -L IT

They have a SourceForge project, and they just finished perlreftut, among others.

Besides Italian, there is also POD2::FR to get the perldocs in French. It looks like POD2::Base has a Klingon version. Anyone using that?

Are there any other translations of the perldocs out there?