Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Shlomi Fish (918)

Shlomi Fish
  shlomif@iglu.org.il
http://www.shlomifish.org/
AOL IM: ShlomiFish (Add Buddy, Send Message)
Yahoo! ID: shlomif2 (Add User, Send Message)
Jabber: ShlomiFish@jabber.org

I'm a hacker of Perl, C, Shell, and occasionally other languages. Perl is my favourite language by far. I'm a member of the Israeli Perl Mongers, and contribute to and advocate open-source technologies. Technorati Profile [technorati.com]

Journal of Shlomi Fish (918)

Friday November 17, 2006
03:05 PM

The Feed Must Get Through!

[ #31646 ]

After I recently upgraded my local copy of XML::RSS, I discovered that my aggregated feed that is generated using XML::Feed from the feeds of all my blogs can no longer be proccessed correctly by Akregator. And when trying to validate it, I encountered some problems. This meant that we introduced some regressions into XML::RSS that had to be fixed.

The first problem I encountered was that I got empty <pubDate></pubDate> code. Looking at the XML::RSS code, I saw that the appropriate fields were still initialised to an empty string instead of undef, which caused them to be outputted. And the code in general was in an intermediate state than my changes. After merging my "datetime" local branch, I also had to fix some markup injection attacks that I found, since I didn't escape some of the tags' contents. Here's the issue with my patch for the whole enchilada.

The next errors had to do with "guid". In XML::RSS "permaLink" holds the guid URL if isPermaLink is true, and "guid" holds it if it is false. However, permaLink was equal to 1. As it turned out the parsing logic was out-dated, and had to be fixed. The fix along with testcases is in my local repository.

Next I found out that some of the items were missing the date time stamp. I noticed that it happened with an RSS 1.0 feed, and as it turned out the <dc:date> items were not handled correctly. A close inspection revelead that XML::Feed initialised XML::RSS with version => "2.0" and so the modules as a result were not defined during the parsing, due to changes in the modules initialisation for XML::RSS. So I added a workaround that when parsing the extra modules will again be defined (with a test). I can't see why version would be useful for anything except output.

And afterwards, the feed validated, and Akregator could read it. I had a lot of other plans for today, which had to be delays because of this work on XML::RSS. But a hacker got to do, what a hacker got to do.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • In XML::RSS "permaLink" holds the guid URL if isPermaLink is true, and "guid" holds it if it is false.

    Sounds like bad API design to me. permaLink should always be the permalink, if there is one, and guid should always be the GUID, if there is one. How this is specified in the wire format is something the API should not expose.

    I don’t know if you’re at liberty to make such changes, though.

    (In fact, I boggle at the effort you’re putting into a module for RSS… Atom’s

    • Sounds like bad API design to me. permaLink should always be the permalink, if there is one, and guid should always be the GUID, if there is one. How this is specified in the wire format is something the API should not expose.

      It is bad API design in my opinion. But this was the API since XML::RSS 1.05 [cpan.org]. BTW, in RSS 2.0 what happens is that the guid element has an isPermaLink attribute which can be "true" or "false". If it is true, then permaLink will hold the contents of the "guid" tag, and if it's fals

    • > Sounds like bad API design to me. [....]

      Indeed. XML::RSS is mostly a big messy patchwork.

      > I don’t know if you’re at liberty to make such changes, though.

      The current focus is to slowly get the test coverage up and bugs fixed; when we have good coverage we can refactor the code and the API (while staying compatible with the old one). As you point out, there really isn't much need for innovations in an RSS module.

      I got sucked into looking after the module after finding a bug (like Shlomi
      --

      -- ask bjoern hansen [askbjoernhansen.com], !try; do();