Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of LTjake (4001)

Monday June 02, 2003
08:08 PM

RSS freshness...

[ #12570 ]

As described earlier, I'm working on a project dealing with RSS feeds.

Basically, I have a page which display RSS feeds. It grabs the feeds from a cache. The cache is updated by a script which periodically checks to see if all of the feeds are fresh.

The hard part is checking if the feed is fresh without actually downloading the feed.

There are some syndication rules that can be specified in an RSS feed -- but they're, to me, confusing. The easiest way, i found, was to use some standard HTTP tricks. Checking the "Last-Modified" header by sending a "If-Modified-Since" header or checking the "Etag" by sending an "If-None-Match" header seems to eliminate a lot of fuss. If only people used those headers more.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I'm also working on a couple projects dealing with RSS Feeds: POE::Component::RSSAggregator [cpan.org] and XML::RSS::Feed [cpan.org]. I'd be interested to know what kind of stuff you're doing so that maybe we can collaborate on something.
    --

    -biz-

    • Hi,

      It seems like the two modules you've been working on are both used to poll RSS feeds. You're passing a name, a url and a delay between checks. Rather, what I've done is scheduled a script to run every couple of hours and check to see if the feeds are updated.

      NOTE: This is all quite beta - things might change.

      The data is stored in a simple XML file:

      <opt tmpl_path="/rss/" max="5">
        <feed url="http://www.alternation.net/~cake/cake_news.xml" custom="no" max="4" />
        <feed u

  • I've been through the same hoops too. The theory sates that the feed should contain good meta-data telling you when it components were made, and how fresh they are. In practice with the general poor quality of feeds, and their often incompleteness this doesn't work.

    I wrote XML::RSS::Tools [cpan.org] to handle some of the problems I faced, I use it to mostly transform RSS to HTML via XSLT, but it's not perfect, and there are many other ways of attacking the problem. At least my module did prompt others to take over a

    --
    -- "It's not magic, it's work..."
    • Hey,

      In my reply, here [perl.org], I state that i store the actual feed in the cache. That may change to the results of the output if i really need further optimization -- but, it's premature optimization at this stage.

      A while back I was looking at RSS modules for perl and wasn't thrilled with XML::RSS. However, that quickly changed - it was updated, and now it's part of the core of how my project works. Thanks =)

      I submitted a bug report [cpan.org] indicating that skipHours and skipDays are used improperly. Although the docs s