Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

miyagawa (1653)

miyagawa
  (email not shown publicly)
http://bulknews.vox.com/
AOL IM: bulknews (Add Buddy, Send Message)

Journal of miyagawa (1653)

Thursday October 25, 2007
02:18 PM

Web::Scraper hacks #3: Read your browser's cookies

[ #34754 ]

Some websites require you to login to the site using your credential, to view the content. It's easily scriptable with WWW::Mechanize, but if you visit the site frequently with your browser, why not reusing the browser's cookies, so as you don't need to script the login process?

Web::Scraper allows you to call methods, or entirely swap its UserAgent object when it scrapes the website. Here's how to do so:

use Web::Scraper;
use HTTP::Cookies::Guess;
 
my $cookie_jar = HTTP::Cookies::Guess->create(file => "/home/miyagawa/.mozilla/cookies.txt");
my $s = scraper { };
$s->user_agent->cookie_jar($cookie_jar);
$s->scrape($uri);

This snippet uses HTTP::Cookies::Guess which provides you a common API to read browser's cookie files (the module supports IE, Firefox, Safari and w3m) and set the cookie jar to the UserAgent object.

If you'd like to change the behavior globally, you can also do:

$Web::Scraper::UserAgent->cookie_jar($cookie_jar);

In either way, you can avoid coding your username and password in the scraping script, which is a huge win.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.