Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ajt (2546)

  (email not shown publicly)

UK based. Perl, XML/HTTP, SAP, Debian hacker.

  • PerlMonks: ajt []
  • Local LUG: AdamTrickett []
  • Debian Administration: ajt []
  • LinkedIn: drajt []

Journal of ajt (2546)

Tuesday February 24, 2004
08:40 AM

Web Cache Thing

[ #17593 ]

Yesterday I was playing with xmltv feeds. I knocked up a little app that connects to a remote server, grabs a file (in this case xml), stores it in a cache, passes it into a templating engine (XSLT), and spits the result out as a web page.

The code is based on an RSS display tool I previously wrote. It's the same principle, grab something, transform it, and display it. In both cases I don't want to grab the data every time, I'm happy to use a local copy if it's only a few hours old.

I realised that a lot of the code could be abstracted out into a module. The transformation element may be to specialised, but we shall see.

  • The core module is fairly simple, you start with new, and passing in the cache data for the application you have.
  • Then you as for a resource with a URI.
    • It looks in the cache to see if it has a local copy that's new enough. The cache would be a plug-in, so you could use something from Cache::cache, a DBM file, SQLite, or a full blown SQL DB.
    • If it's not in the cache, or it's expired, then it gets the asset. Plug-ins would support file, http, ftp and so on.
    • The new data is cached.
    • ? The data is transformed via a plug-in.
    • ?? Transformed data stored ??
  • Your app does what it wants with the data.

I'm not trying to re-invent Squid, I just want a simple URI getting tool, that can cache data. Basically you can call the app many times, but not check source data every time.

Does this exist on CPAN already? is so where? If it doesn't exist already, what should I call it?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Why not just use LWP's mirror() function for this? Does it have to be as complex as you describe?
    • Interesting idea. I knew LWP did some caching, but I thought it was in memory caching, for within a single process, not file caching. This is probably enough, certainly for me anyway.

      However LWP still has to do a HTTP request, to determine if the page has changed, and some feed providers may still dislike the frequency of the requests, even if they are only "304, Not modified since last retrieval".

      My other option is to use cron and Wget to obtain the data, and try not do my data gathering at page viewin

      -- "It's not magic, it's work..."
      • I think if you pull the feed as you're viewing it you're likely to put LESS strain on the remote end than they normally see from a script running 48 times a day. Unless you hit refresh more than 48 times.

        Honestly, anyone getting antsy about an individual user refreshing their RSS feed a few times in a few minutes has way too much time on their hands, and doesn't understand the web.
      • You can probably subclass mirror() and make it check for an explicit expiry date (assuming the HTTP headers are stored somewhere). If you're not past the expiry date, you should be able to use the file without revalidation.