Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

petdance (2468)

petdance
  andy@petdance.com
http://www.perlbuzz.com/
AOL IM: petdance (Add Buddy, Send Message)
Yahoo! ID: petdance (Add User, Send Message)
Jabber: petdance@gmail.com

I'm Andy Lester, and I like to test stuff. I also write for the Perl Journal, and do tech edits on books. Sometimes I write code, too.

Journal of petdance (2468)

Saturday February 14, 2004
02:38 PM

Perl haiku contest, and today's 5-minute hack

[ #17418 ]
ActiveState announced the winners of their Perl haiku contest. The funny thing was that I didn't know any of the names shown, and there sure seemed like a lot of duplicates. Plus, there was a Dishonorable Mention for this entry:

Unreadable code,
Why would anyone use it?
Learn a better way.

Here's why I use it: Because I can write a program to summarize the winners on the web page in 5 minutes.

use WWW::Mechanize;
my $mech = WWW::Mechanize->new( autocheck => 1 );

$mech->get( "http://aspn.activestate.com/ASPN/Perl/Haiku/AboutPerl" );

my @names = ($mech->content =~ /Name: (.+?)<BR/igm);
my %count;
++$count{$_} for @names;
for my $key ( sort { $count{$b}<=>$count{$a} || lc $a cmp lc $b } keys %count ) {
    printf "%3d: %s\n", $count{$key}, $key;
}

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • thanks for the sample code. tried it. worked on cygwin (beats activestate and I dont have a *nix box online) after a bit of buggerising around with CPAN modules.

    One question though - why is it better to use regex's to extract data rather than to use a html parser?

    I ask this as I'm currently using re's to extract data from google and writing and debugging regex's takes up the most time.

    --
    bootload [netspace.net.au], groking softwa
    • It's better in this case because I wanted a quick (5 minutes, remember) and dirty solution to my problem.

      If you're extracting data from Google, you may want to look at the link functions in WWW::Mechanize, anyway. It does a lot of the parsing for you.

      --

      --
      xoa

  • I needed to know who subscribed and unsubscribed from the Groo [groo.com] mailing-list in the old times. The online archives list every mail received by the list owner between 1995 and 1997, including subcription and unsubscription notifications.

    Here's the hack:

    #/usr/bin/perl
    use strict;
    use WWW::Mechanize;

    $|++;
    my $bot = WWW::Mechanize->new;
    $bot->get('http://www.groo.com/mail/');
    my @links =
      $bot->find_all_links( text_regex => qr/^(?:UN)?SUBSCRIBE groo-l$/ );

    my ( $sub, $uns ) = ( 0, 0 );
    for (revers