Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of LTjake (4001)

Wednesday July 23, 2003
07:38 AM

handling HTTP response codes

[ #13641 ]

After Mark's post yesterday on how atom aggregators should handle http response codes, I decided to bring our (currently in-house-only) RSS aggregator up to speed. I think it's safe to assume that both RSS and atom aggregators should handle HTTP in the same manner.

The structure of the aggregator is a little weird because it's a server-based "what's new?" type app and not a desktop/personal news reader.

There are two main parts to the app that read a feed.

  1. the script (scheduled to run every couple of hours)
  2. the actual web app which reads only from the cache

So, i really only had to deal with the script since it's the only part that actually deals with external sites.

I was able to check off a few of the requirements off right away.

  • It already handled Etags and Last-Modified dates (even though it's in-house only right now, we want it to play nice with externals servers when the time comes)
  • Referer and User-Agent were set appropriately.
  • https is supported (simply by installing Crypt::SSLeay)
  • A requirement not on the list, but important to note is that it supports the file:// protocol so all local feeds can be read straight off the file system (it also returns a Last-Modified date! LWP rocks!)

Adding gzip and deflate support was a snap -- along with sending the right header with my request

$request->header( Accept_encoding => 'gzip; deflate' );

handling the returned data was too easy:

if ( my $encoding = $response->header( 'Content-Encoding' ) ) {
   require Compress::Zlib;

   $data = Compress::Zlib::memGunzip( $data ) if $encoding =~ /gzip/i;
   $data = Compress::Zlib::uncompress( $data ) if $encoding =~ /deflate/i;

LWP was handling redirects automatically which was alright except that there needs to be a distinction between temporary redirects (300, 302, 307) and permanent ones (301). I had to add in a loop and use simple_request() so i could see what was returned after each request. After a permanent redirect, the URL will be permanently updated in the config file (temp redirects will not affect the config).

Speaking of redirects, code 304 Not modified is listed under is_redirect() -- which is a little misleading (but true to the specs).

I didn't make any special rules for any codes that fall under is_error(). It currently only reports that an error occurred -- I haven't decided how much of Mark's recommendation I want to follow for this app.

Other than that, I still need to look at authentication -- basic auth. is handled in the URL only, but that poses a security risk since i use URL as a key, stored in a user-cookie for customization purposes -- and proxy support.

Mark put up some tests today. Sadly there are only atom based feeds so, i can't try to parse the data, but the status codes that they return seem useful.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.