Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

wirebird (8007)


Author of Wirebird (also known by its current version, Gamehawk), which is intended to provide separate interfaces to the same community via mailing list and webforum (among others), chiefly because culture clash is fun to watch. Someday I should submit something to CPAN.

Journal of wirebird (8007)

Thursday September 20, 2007
11:55 AM

Parsing b0rken RSS

[ #34493 ]
Yeah, yeah, I know... there is no RSS that isn't broken, one way or another, but not everybody has boarded the Atom bus yet.

So I'm using XML::RSS to parse a variety of often-broken feeds (Simple Machines Forum, just as a for-instance, appears to encode its entries, *then* truncate them, resulting dangling tags or even in "..." happening right in the middle of multibyte characters, tags, whatever). And it's dieing fairly often.

No problem, I'll just wrap it in an eval, and skip that feed until its issues get resolved (usually by the corrupted entry expiring, hopefully before another corrupted one jumps on board).

And so I do. And it dies inside the eval. Wait, what?

I mean, I do seldom run across modules that are ill-behaved enough to up and die instead of throwing something more catchable. But I thought eval was a magic fix for that.

Googling a little, the answer on perlmonks and elsewhere seems to be "Well, of course it should die irrevocably. You shouldn't be using invalid XML anyway." Fine, fine, that's the first thing I'll outlaw when I'm made Empress Of All Intarwebz. In the meantime, back in the real world, I'd *like* to be able to recover and go on to the XMLs that ARE valid (so far as you can say "valid" about RSS), thankyouverymuch.

eval's never failed me before, though. I might actually have to learn how it works so I can figure out what I'm doing wrong.

But seriously. How can you screw up
eval( $feed->parse($page));

Apparently, if you're me, "pretty easily."
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Perhaps eval (); should be eval { };?

    use XML::RSS;
    my $f = XML::RSS->new;
    eval { $f->parse( 'x' ); };
    print 'okay';
    • Unless you mis-transcribed, LTJake is right. What you're doing is "eval EXPR" which evaluates the expression normally, then assumes it's a string containing Perl code and evaluates that. What you want is "eval BLOCK" which runs the block of code, catching exceptions.
      • Yep, that's exactly what it is. The only mis-transcription was that I'd formatted it all pretty in my code just like it was a block, and it never once registered that hey, those are awfully rounded curly-braces. Not even when I left it all on one line in my rant. (And I guess that parse() must return a 1 or something equally harmless on success.)

        Bah, decongestants. This is why I'm not working on production code today. And clearly should not even be let near even the quick hack stuff, either.

        Thanks, g
    • You know, you'd think that when it choked on a semi-colon I put in there (when I added a hello-world just to make sure it was hitting that code on *working* feeds), it would have maybe sent up a red flag for me. But nooooo, I sez to myself, I sez, "Gee, I can't even type a working print statement today, I'm just making this worse" and reverted it all instead of thinking about it.

      Yyyeah. No more coding for me today.
  • You might want to try XML::Liberal and XML::Feed for your feed parsing needs.
    • XML::Feed is just an API-unification wrapper around XML::RSS and XML::Atom::Feed, innit? I'm using XML::Atom::Syndication::Feed instead, for reasons I can't quite remember just now, but, uh, I'm sure they were good ones at the time.

      XML::Liberal might help, though I assume it has the same issue XML::Twig (which I saw someone recommend as an alternative) has in that it's not RSS-specific, and this was supposed to be a quick and dirty hack to pull title, date, content, and link out of feeds... and I didn't