I am writing a little spider application, using LWP and all of that good stuff. For my particular application I need to set the referer header, and along the way I collect the right URLs to put in that.
Since I am using LWP, URLs tend to show up as objects, but when I try to put them back into an HTTP request, things blow-up:
use HTTP::Request;
use URI;
my $url = URI->new( 'http://www.example.com' );
my $request = HTTP::Request->new( "http://www2.example.com" );
$request->referer( $url );
The referer() method comes from HTTP::Headers, and all it does is pass its arguments to the _headers() method. Inside the headers method, that $url ends up in $val, and then it has to run the gauntlet:
[HTTP::Headers, 1.43 sub _headers]
if (defined($val)) {
my @new = ($op eq 'PUSH') ? @old : ();
if (!ref($val)) {
push(@new, $val);
} elsif (ref($val) eq 'ARRAY') {
push(@new, @$val);
} else {
Carp::croak("Unexpected field value $val");
}
$self->{$lc_field} = @new > 1 ? \@new : $new[0];
}
The thing in $val is defined, so it makes it into the block, but it is a reference, but not an ARRAY reference, so it falls through to the else{}. This works for most things, because _headers is a generic method, but referer could be a bit smarter.
[HTTP::Headers, 1.43, referer()]
sub referer { (shift->_header('Referer', @_))[0] }
Debugging this is was a pain. The URI objects automatically stringify, so printing them just shows the string form, rather than something like "URI=HASH(0xfb748)". My usual debugger, print(), fails to pick this up.
There are a couple of ways around this, none of them satisfying:
Oh well, now you know. Do not pull your hair out over this one, because I already did.
print and Data::Dumper (Score:2, Insightful)
I much prefer visually scanning through Dumper($foo) to clicking through some elaborate tree view in a GUI debugger.
Re:print and Data::Dumper (Score:3, Informative)
I was using Data::Dumper in a lot of places, but by the time I thought to see what was in the scalar variable (usually not a candidate for a Dumper() call), I knew what the problem was.
Indeed, there were all sorts of signs of what was happening, and everything got clouded because my starting point was wrong: URI objects will always do the right thing with LWP, but that was not the case.
Threads doesn't like URI's object stuff (Score:3, Interesting)
--
xoa
Reply to This
Re:Threads doesn't like URI's object stuff (Score:3, Informative)
This is fixed in HTTP::Headers 1.47 (Score:3, Interesting)
I thought I had updated LWP when I got home, but that is what I get for thinking.
Reply to This