Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Sunday November 24, 2002
02:43 PM

Which cookies do you want to eat today?

[ #9121 ]

The use.perl system lets people designate other people as their "friends". I wanted to see my friend list, but I also wanted to avoid using a web browser. As I have in previous entries, I created a script to do this. Unlike previous scripts that interact with use.perl, I need to tell use.perl who I am. Everything I need to know is stored in my browser's cookie file, which I load with HTTP::Cookies.

#!/usr/bin/perl -w
use strict;
 
use File::Spec::Functions;
use HTTP::Cookies;
 
# set up the transaction
use LWP::Simple qw($ua get);
my $cookie_file = $ENV{COOKIE_FILE};
my $cookie_jar  = HTTP::Cookies::Netscape->new( file => $cookie_file );
$ua->cookie_jar( $cookie_jar );
 
my $url = 'http://use.perl.org/my/fans';
my $data = get( $url );
 
my $Counter     = $ENV{USE_PERL_JOURNAL_FRIENDS}  ||
    catfile( $ENV{HOME}, ".use_perl_friends" );
 
dbmopen my %hash, $Counter, 0640 or die $!;
 
# scrub the HTML
$data =~ s|.*!-- start template: ID 249.*?<tr>\s*||si;
$data =~ s|\s*</tr>\s*</table>.*||si;
$data =~ s|\s*<td VALIGN="TOP">\s*||gis;
$data =~ s=\s*</?(?:a|em|img|b).*?>==gis;
$data =~ s|\s*</tr>\s+<tr>\s*|\f|gis;
 
# extract the names and user-ids from what's left
my @b =
map { [ reverse m|(\S+)\s+\((\d+)\)|g ];  }
map { [ split m|\s*</td>\s*|i ]->[0] }
split /\f/, $data;
 
# output the results
my $count = 1;
foreach my $array ( @b )
    {
    $hash{ $array->[1] }++;
    printf "%3d: %5d %-15s", $count++, @$array;
    print "\t<--- NEW" if $hash{ $array->[1] } == 1;
    print "\n";
    }

A lot of people do not realize that LWP::Simple will expose its user agent object so I can change it. I do not need to use LWP::UserAgent or HTTP::Request directly---I just import $ua and add a cookie jar to it. LWP::Simple still handles all of the thinking for me. I usually do this if I only need to affect the user agent and not the request.

The cookie jar is a problem though. HTTP::Cookies recognizes two formats---an internal one and the Netscape format. I; however, use Mozilla, the open source descendent of Netscape Communications Corporation's Navigator. Netscape tagged their cookie files on the first line:

# Netscape HTTP Cookie File

Mozilla is not assoicated with Netscape Communication Corporation (since it went way a long time ago). It uses the same cookie file format, but its first line omits "Netscape":

# HTTP Cookie File

The HTTP::Cookies::Netscape subclass tries to recognize the cookie format by reading the first line, but it does too much work in its load method. I have patch for this.

# HTTP/Cookies.pm version 1.25, in HTTP::Cookies::Netscape::load
746     unless ($magic =~ /^\# Netscape HTTP Cookie File/) {
747     warn "$file does not look like a netscape cookies file" if $^W;
748     close(FILE);
749     return;
750     }

The HTTP::Cookies::new method calls load too soon. It uses the value of $class to determine the file format, which means that the programmer has to choose the right format by choosing the right class name. This tightly couples the code to a particular use. If I change the cookies file format, the code changes.

78 sub new
79 {
80     my $class = shift;
81     my $self = bless {
82     COOKIES => {},
83     }, $class;
84     my %cnf = @_;
85     for (keys %cnf) {
86     $self->{lc($_)} = $cnf{$_};
87     }
88     $self->load;
89     $self;
90 }

The solution to this a Facade, like I wrote about in The Facade Design Pattern.

The programmer should not have to think about the cookie file format. If one user wants to use his Netscape Navigator cookies, another wants to use Mozilla, another Opera, and so on, they should be able to do that without code surgery.

The trick is to recognize the file format before new() creates the object. One way to do this is create a hash of file first lines and class names to instantiate. I have truncated some of the lines in %Classes.

my %Classes = (
    '# Netscape HTTP Cookie File' => 'HTTP::Cookies::Netscape',
    '# HTTP Cookie File'          => 'HTTP::Cookies::Netscape',
    '#LWP-Cookies-...'            => 'HTTP::Cookies',
    '<?xml version="1.0"...'      => 'HTTP::Cookies::XML',
    );

Once new() determines the type of file format, it creates the object in the appropriate class. The load method no longer has to know how to recognize the file format, although it can still return an error if the file format is not right.

sub new
    {
    my $class = shift;
    my %args  = @_;
    ...;
 
    my $self = ...;
 
    my $first_line = do {
        if( exists $args{File} )
            {
            open my $fh, $args{File};
            <$fh>;
            }
        };
 
    my $real_class = $Classes{ $first_line };
 
    bless $self, $real_class;
    }

Once the object is in the right class, every method calls go through that class. Each file format can override HTTP::Cookies's save() and load() method to write the correct file format and the user is now the wiser. As long as you have the sub-class installed, everything is peachy.

This works much better with automatic cookie file discovery, which I am still am thinking about. Mozilla has a really wierd way to do this---my personal cookie file is /Users/brian/Library/Mozilla/Profiles/default/4o04imb5.slt/cookies.txt, which is mostly easy to guess save for the profile name and next subdirectory. OmniWeb stores it differently (and in XML), and other platforms are different yet again. Some browsers do not even use a single file. Even after all of that, I still have to figure out which browser I want to use from the six I have installed.

Yes, web standards are fun.