Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

mdxi (4658)

  (email not shown publicly)

Journal of mdxi (4658)

Tuesday April 12, 2005
03:16 PM


[ #24151 ]

The Ninety-Ninety Rule has reared its ugly head on Olive development. After three days of whipping along at thoroughly unbelievable (for me) speeds, I hit a roadblock named XML. Or RSS, , or both, depending on exactly what time it was.

I started an attempt to actually slurp in and parse RSS feeds using XML::RSS::Parser because it was the first match on CPAN and it looked fairly promising. While reading its docs I discovered XML::RAI, which was like a magic bullet that did exactly what I wanted to do: read an RSS feed and turn it into a uniform, easilly accessible data structure.

This is harder than it sounds because there are so many varieties of RSS, including 2 mutually incompatible versions which are both named "RSS 2.0".

XML::RAI seemed to work on initial tests, and I was very happy, but it fell down hard on all but the smallest datasets. When I first tried the BBC's front page feed -- 15k, 30-40 entries, not a big file -- each item iteration was taking noticably longer. First a hint longer, then a second longer, then 3, then 5, then 15, then 30. top(1) revealed that RAM usage had shot to 135M (real!) and was climbing by another 20M every second. I killed it before the kernel had to do it. Apparently the root problem is that it's Very Clever and uses autoloading to create more or less everything on the fly. Structures, objects, creators, accessors, all of it generated at runtime by a fairly small, very dense wodge of code.

Elegance is good, but Clever kills.

I really wanted to make SOMETHING show up in the top pane before sleeping, so I started ransacking CPAN and Google for options. My problem at this phase can be seen first-hand by searching for "xml rss perl" and laughing heartily to yourself. I tried...I tried a lot of things. I can only remember XML::Twig, XML::Parser, but there was a frenzy of cpan installs late last night. Finally I discovered XML::Simple which did pretty much what I wanted, except that it knew nothing about RSS, so I'd have to write that part myself. I could deal with that. RSS2 is well-specified and simple enough that I picked it and made fairly short work of things.

EXCEPT for discovering that RSS2 dates are themselves specified by RFC-822 (already knew this part), which was written in 1982, which was before the Y2K era. That's right, even tnough everyone uses 4-digit years, 2-digit dates are legal. And the BBC, who are part of my test suite, use 2 digit dates in their RSS2 feeds. Great big special case hackery. Then I went to bed. At 0835h. And woke up 2 hours later.

Since then I've hooked up the whole Add -> Fetch -> Parse -> (Re)Display chain, and it's working pretty well. Except that unicode fucks my spacing in the story selector pane. Double-width chars stomp at right, odd half-width spaces (Cyrillic) yank stuff to the left. Spent abuot 3 hours fooling with that, including a really audacious attempt at creating 3 separate listboxes linked by OnSelectChange callbacks. It actually worked except that only the focused listbox has the reverse-video indicator, and nonfocused ones don't scroll. Their selected item did change and become highlighted in sync with the focused list though. Neat, but a waste of an hour. For right now I have given up on unicode correctness.

So what's left? Pretty much everything dealing with auto-fetching feeds and folding in story updates. Hopefully that won't be too difficult, as it involves neither XML or Unicode.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.