Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

jjohn (22)

jjohn
  (email not shown publicly)
http://taskboy.com/
AOL IM: taskboy3000 (Add Buddy, Send Message)

Perl hack/Linux buff/OSS junkie.

Journal of jjohn (22)

Thursday January 20, 2005
01:59 PM

Scraping weather

[ #22809 ]

As I continue to recover from my own personal IT nightmare, I have made a few new improvements in order to replace those conveniences that have been fsck'ed away.

This entry, for instances, was made entirely on an stoopid XP box using emacs and win32 perl. The code for this is substantially the same as what I published in that article for use.perl.org about the SOAP interface. So now I can bloginate again, much to the relief of all my attentive readers.

Here I present a small, somewhat naughty utility that replaces a more elegant bash shell hack to spit out my local weather forecast. It scrapes Weather Underground and reports the 5-day forecast on the command line, without ads or other distractions. You will need to change the zipcode, should you wish to use this program yourself.

use strict;
use LWP::UserAgent;
use Text::Wrap;

my $zip = "02215";
my $url = qq[http://www.wunderground.com/cgi-bin/findweather/getForecast?query=$zip];

m y $ua = LWP::UserAgent->new;
$ua->agent(q[Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)]);
my $res = $ua->request(HTTP::Request->new(GET=>$url));

unless ($res->is_success) {
    die "Can't fetch $url: ", $res->code, "\n";
}

my $content = $res->content;

my $updated = "";
my $in_rec;
for $_ (split /\n/, $content) {
    if ($updated) {
        if (m!width="100%" ><b>([^<]+)</b><br>!) {
            $in_rec = $1;
            next;
        }

        if ($in_rec) {
            if (/<br>/) {
                # out of record
                print "$in_rec\n";
                undef($in_rec);
            } else {
                $in_rec .= ":\n" . wrap("\t", "\t", $_) . "\n";
            }
        }
    } else {
        if (m!Updated: <b>([^<]+)</b>\s*$!) {
            $updated = $1;
            print "Updated: $updated\n";
        }
    }

}

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Hey, I used to live in 02215. I lived at 514 Park Street and 52 Buswell, both on Boston U's South Campus.

    Um. That's all. Well, also, scraping wunderground was one of the first screen scraping things I ever did. (Scrape, btw, not scrap.)

    OK BYE
    --
    rjbs