Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

petdance (2468)

petdance
  andy@petdance.com
http://www.perlbuzz.com/
AOL IM: petdance (Add Buddy, Send Message)
Yahoo! ID: petdance (Add User, Send Message)
Jabber: petdance@gmail.com

I'm Andy Lester, and I like to test stuff. I also write for the Perl Journal, and do tech edits on books. Sometimes I write code, too.

Journal of petdance (2468)

Tuesday December 01, 2009
11:53 AM

Spiteful spam

I know that a lot of people are moving their blogs over to http://blogs.perl.org/, leaving http://use.perl.org/ behind. Part of the frustration is that Chris Nandor, Pudge, hasn't done much to modernize use.perl.org, but hey, it's Pudge's choice, and he runs the site, and we're all here by grace of him running it. Beggars and choosers, y'know. If you're frustrated with a Perl news site, you can go start your own.

So certainly, I think this spam I just received is just out of line.

From: GreatestColonHealth <Kevin...@...by.com>
Subject: With This Astounding Cleanser You May Eliminate Pudge

That's just nasty!

Sunday April 06, 2008
02:04 AM

Rethinking the interface to CPAN

I've started a group, rethinking-cpan, for discussing the ideas I've posted here. -- Andy

Every few months, someone comes up with a modest proposal to improve CPAN and its public face. Usually it'll be about "how to make CPAN easier to search". It may be about adding reviews to search.cpan.org, or reorganizing the categories, or any number of relatively easy-to-implement tasks. It'll be a good idea, but it's focused too tightly.

We don't want to "make CPAN easier to search." What we're really trying to do is help with the selection process. We want to help the user find and select the best tool for the job.

It might involve showing the user the bug queue; or a list of reviews; or an average star rating. But ultimately, the goal is to let any person with a given problem find and select a solution.

"I want to parse XML, what should I use?" is a common question. XML::Parser? XML::Simple? XML::Twig? If "parse XML" really means "find a single tag out of a big order file my boss gave me", the answer might well be a regex, no? Perl's mighty CPAN is both blessing and curse. We have 14,966 distributions as I write this, but people say "I can't find what I want." Searching for "XML" is barely a useful exercise.

Success in the real world

Let's take a look at an example outside of the programming world. In my day job, I work for Follett Library Resources and Book Wholesalers, Inc. We are basically the Amazon.com for the school & public library markets, respectively. The key feature to the website is not ordering, but in helping librarians decide what books they should buy for their libraries. Imagine you have an elementary school library, and $10,000 in book budget for the year. What books do you buy? Our website is geared to making that happen.

Part of this is technical solutions. We have effective keyword searching, so you can search for "horses" and get books about horses. Part of it is filtering, like "I want books for this grade level, and that have been positively reviewed in at least two journals," in addition to plain ol' keyword searching. Part of it is showing book covers, and reprinting reviews from journals. (If anyone's interested in specifics, let me know and I can probably get you some screenshots and/or guest access.)

BWI takes it even farther. There's an entire department called Collection Development where librarians select books, CDs & DVDs to recommend to the librarians. The recommendations could be based on choices made by the CollDev staff directly. They could be compiled from awards lists (Caldecott, Newbery) or state lists (the Texas Bluebonnet Awards, for example). Whatever the source, they help solve the customer's problem of "I need to buy some books, what's good?"

This is no small part of the business. The websites for the two companies are key differentiators in the marketplace. Specifically, they raise the company's level of service from simply providing an item to purchase to actually helping the customer do her/his job. There's no point in providing access to hundreds of thousands of books, CDs and DVDs if the librarian can't decide what to buy. FLR is the #1 vendor in the market, in large part because of the effectiveness of the website.

Relentless focus on finding the right thing

Take a look at the front of the FLR website. As I write this, the page first thing a user sees is "Looking for lists of top titles?" That link leads to a page of lists for users to browse. Award lists, popular series grouped by grade level, top video choices, a list called "Too good to miss," and so on. The entire focus that the user sees is "How can I help you find what you want?"

Compare that with the front page of search.cpan.org. Twenty-six links to the categories that link to modules in the archaic Module List. Go on, tell me what's in "Control Flow Utilities," I dare you. Where do I find my XML modules? Seriously, read through all 26 categories without laughing and/or crying. Where would someone find Template Toolkit? Catalyst? ack? Class::Accessor? That one module that I heard about somewhere that lets me access my Lloyd's bank account programtically?

Even if you can navigate the categories, it hardly matters. Clicking through to the category list leads to a one-line description like "Another way of exporting symbols." Plus, the majority of modules on CPAN are not registered in the Module List. The Module List is an artifact a decade old that has far outlived its original usefulness.

What can we do?

There have been attempts, some implemented, some not, to do many of these things that FLR & BWI do very effectively. We have CPAN ratings and keyword searching, for example. BWI selects lists of top books, and Shlomi Fish has recently suggested having reviews of categories of modules, which sounds like a great idea. I made a very tentative start on this on perl101.org. But it's not enough.

We need to stop thinking tactical ("Let's have reviews") and start thinking ("How do we get the proper modules/solutions in the hands of the users that want them.") Nothing short of a complete overhaul of the front end of the CPAN will make a dent in this problem. We need a revolution, not evolution, to solve the problem.

Monday March 24, 2008
10:45 PM

ack 1.78 is out

After three months of lots of development work and intermediate releases, I've released ack 1.78. There are tons of new features and lots of compatibility fixes for Windows. ack is a replacement for grep that is geared to working with trees of code.

Highlights in this release include:

  • Files specified on the command line are always searched, even if they don't match a known filetype
  • Ability to ignore directories
  • Pager support
  • More flexible grouping options
  • Many more languages recognized and existing ones improved, including CFMX, Actionscript, assembly, Tcl, Lisp, Smalltalk
  • Ability to define your own languages based on filetype

ack may well change the way you work on the command-line with source code. Try it out and let me know what you think. You can install it by installing App::Ack from CPAN, or downloading the standalone version to your ~/bin directory.

Monday March 03, 2008
11:26 AM

Who will take over perl101.org?

Who out there has some free time and is interested in helping out beginners?

My little project for Perl beginners, Perl101.org, has been largely ignored lately. My idea was to have a cookbooky style set of pages that beginners could read for things like the right way to get a count of elements in an array, or how to extract links from a web page without using a regular expression.

It started out pretty nicely, but has lain fallow for months now. I'd like it if someone could take it over. I'll hand over the domain name and the Google Code project for the site, and you'll keep this going and make it something more useful. If you want to overhaul how it works, or keep the same system going, it doesn't matter to me. All I require is that you'll do something useful for the beginners.

Any interest?

Thursday November 22, 2007
02:00 PM

Perl gratitude, 2007

Here in the US, it's Thanksgiving, a day of eating lots of food, watching football, and sometimes, just sometimes, expressing gratitude and giving thanks for those things that make life wonderful.

Here are the things I'm grateful for in late 2007, in no particular order after the first.

Google Code

Google's project hosting service has been a godsend. It's changed the way I do open source projects. It has leapfrogged SourceForge for ease of maintenance, and the bug tracker trumps RT for CPAN that we've been using for so long. Add that to the integration with Google Groups which makes it trivial to create mailing lists, and it's at the tops of my list for 2007. I can't say enough good about it.

The readers of Perlbuzz

Eleven weeks ago, Skud and I started this little website called Perlbuzz as an alternative to the "more traditional outlets" for news in the Perl world. The response has been tremendous. We get 600 RSS readers every day, and have had over 10,000 unique visitors in that time. It makes me happy that our little venture is used and appreciated by the community.

Test::Harness 3.0

It's been over a year in the making, but the new version of the crucial Test::Harness 3.0 means more flexibility for module authors, and lots of UI improvements for people who just want to run prove and make test.

Mark Dominus

MJD is so much a fixture in Perl it's easy to forget that he's there. For 2007, though, never mind all the things he's done for Perl in the past, or the hours I've spent being enthralled in talks of his. His Universe Of Discourse blog is the single most intelligent blog out there, and sometimes it just happens to be about Perl.

Andy Armstrong

Was Andy Armstrong always around, or did I just not notice? His time and dedication spent on climbing on board with Ovid and Schwern and the rest of the Test::Harness 3.0 crew has been invaluable in getting it out. Plus, he's a really swell guy anyway.

Dave Hoover

When I finally despaired of the amount of time and frustration it took to organize content for Chicago.pm's Wheaton meetings, Dave Hoover stepped up and volunteered to take it over. I'm thankful, but not as much as I hope the other Chicago.pm folks are.

Perl::Critic

I'm all about having the machine keep an eye out for the stupid things we do, and the goodness of Perl::Critic is always impressive. You won't like everything Perl::Critic says about your code, but that's OK. It's an entire framework for enforcing good Perl coding practices.

The Perl Community in general

The Perl community is populated by some tremendous folks. Some names are more known than others, but these people help make daily Perl life better for me. In no particular order, I want to single out Pete Krawczyk, Kent Cowgill, Elliot Shank, Liz Cortell, Jason Crome, Yaakov Sloman, Michael Schwern, Andy Armstrong, Ricardo Signes, Julian Cash, Jim Thomason, chromatic, Chris Dolan, Adam Kennedy, Josh McAdams and of course Kirrily Robert. If you think you should be on this list, you're probably right, and I just forgot.

My wife, Amy Lester

Because even if she doesn't understand this part of my life, she at least understands its importance to me.

I'd love to hear back from anyone about what they're thankful for. I'm thinking about having a regular Perlbuzz "Love Letters to Perl" column where people write about what they love in Perl.

Sunday November 04, 2007
10:50 PM

ack 1.70 adds context and line-specific matching

ack, my replacement for grep for 95% of the times programmers use grep, just got released to CPAN with version 1.70.

At long last, you can now get contextual lines before and after matched lines, just like GNU grep's -A, -B and -C options. You can also match on a specific line number or range of line numbers with the new --line option. For example, if you want to see the first line of every Perl file in a tree, you'd just do ack --line=1 --perl. Thanks very much to Torsten Biix for putting both these features together for me.

Finally, Elliot Shank pointed out that one of my favorite features, the -1 option, was never documented. Now it is. The -1 option says "stop after the first match of any type." If you find yourself acking for lines, or searching for a specific file with ack -g and then having to Ctrl-C to stop the search process, just add a -1 and Ctrl-C no longer.

ack is available in the ack distribution on CPAN, or by installing the module App::Ack from the CPAN shell. You can also download the single-file version direct from Subversion and drop it right into your ~/bin directory.

Wednesday October 31, 2007
12:13 PM

New WWW::Mechanize and Test::WWW::Mechanize spiffiness

For those of you using Mech for your testing of your website:

    $agent->content_contains( qr/\QEnter keyword(s)/ )
        or $agent->dump_all( \*STDERR );

not ok 14 - Content contains '(?-xism:Enter\ keyword\(s\))'
#   Failed test 'Content contains '(?-xism:Enter\ keyword\(s\))''
#   at t/simple-search.t line 31.
#     searched: "<HTML>\x{0a}<HEAD>\x{0a}<TITLE>TitleTales&#153;</TITLE></HEA"...
#   can't find: "(?-xism:Enter\ keyword\(s\))"
/buttonsd/bisac2.gif
/graphics/bar.gif
POST http://hoops.flr.follett.com:2112/simpsearch.php [simsearch]
  clickval=                      (hidden readonly)
  searchwords=                   (text)
  S=<UNDEF>                      (checkbox) [*<UNDEF>/off|on/Include Out of Print / Please Order Direct Titles]

No longer do you have to do a $mech->save_content() and then run mech-dump on it. How has it taken me so long to put this stuff in there?

Friday October 12, 2007
10:51 AM

Evolution requires mutation

(Originally posted at http://perlbuzz.com/2007/10/evolution-requires-mutation.html)

In the past couple of days, I've seen some counterproductive social behaviors that help scare away community members and lead to boring monoculture: Taking a public dump on the projects of others when they do not directly affect you. It's rude, it discourages future risk taking in everyone, it goes against the very nature of open source that has brought us here today, and it leads to monoculture. I'd like people to stop.

Mutation #1: kurila

Gerard Goossen recently released kurila, a fork of Perl 5 that includes some speedups and tweaks that seem to scratch Gerard's itches, as well as bundling extra modules. I'm right now trying to get an interview with him to find out more about his project and the reasons behind it, because there are probably some interesting lessons in there. However, the disapproval on the Perl 5 Ports list was swift and severe.

All forking based on the Perl 5 syntax and code base, throwing away CPAN compatibility, seems to me to be a complete worthless waste of time.

So what? Who is anyone to say how Gerard is to use his time? Is there any harm here? No? Then leave the guy alone, please.

Mutation #2: lambda

Eric Wilhelm released lambda, a distribution that lets you use the Greek character lambda (λ) as an alias for sub {...}, apparently as a nod to Python's lambda keyword for anonymous functions. Immediately people jumped on him saying that the module should go into the Acme:: namespace, as if the namespaces of CPAN mean anything in 2007. There was also this cluck-cluck from someone I figured would be more encouraging (and later apologized, as it turns out):

Well, if you want to use it in your own code and your work's code, that's fine (because I'm sure you find typing CONTROL-SHIFT-EL so much easier than "sub {}" :) but if it shows up in your CPAN modules, you might get a few complaints since this sugar, while a really nifty hack, adds nothing complex but does screw up older editors and will confuse the heck out of a lot of maintenance programmers.

Personally, I figure that if someone's a smart enough programmer to do a hack like the lambda module, he or she is also smart enough to figure out potential downsides. And so what if he doesn't? What's the harm here?

Mutation #3: perlbuzz

Perlbuzz itself has always come under this umbrella of disapproval. Even before we announced the site, Skud and I have fended off the comments saying "We already have use.perl.org, we don't need Perlbuzz." Maybe not, but why do you care if we start the site? Why does it bother you? And why do you find it necessary to tell us that we're embarking on a waste of time?

I hope that in the past few months, the work that Skud and I have done have shown you, the reader, that Perlbuzz is a worthwhile addition to the Perl community, and a valuable news source that overlaps other news sources while not being a subset. What if Skud and I had listened to the tsk tsk of the doubters? Perl would be right where we it was before, with nothing new.

Evolution requires mutation

Why are we so quick to take a dump on the projects of others? The only way anything interesting happens is that people try weird, new things and see what sticks. What if Larry had listened to those way back when who said "Ah, we've got Awk and shell tools, we don't need Perl?"

I fear our tendency to monoculture. I want crazy new projects to thrive, not get squashed at their very infancy. Next time someone comes out with a project that you think is silly, congratulate the person rather than scoffing at it. Who knows what it might lead to?

(And a big thank you to Jim Brandt for the "Evolution requires mutation" idea.)

Friday August 24, 2007
10:56 PM

Today's snazzy ack trick

I'm going through a codebase that's got a ton of unused files that have never been pruned. I use ack to look to see if a given file is used, and if not, svn rm the sucker. Then the domino effect starts. Removing that file means that there may well be others, both HTML and graphic, that are no longer used, too. Here's my handy tool to make that easier:

svn diff | grep ^- | ack '(href|src)="(.+?)"' --output='$2' | sort -u

Get the diff, only look at the lines where something's been removed, then find href= or src=, and only show what's in the parens, then sort and dedupe. Voila!

Friday August 10, 2007
10:00 AM

Perl security done right

I just read the security chapter in Mastering Perl. It should be required reading for anyone who does any Perl programming for the web. brian's discussion of tainting is the best I've seen yet. I only wish he'd mentioned tainting and DBI.