NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Re: (Score:1)
My hammer of choice is forcing the input to XHTML using HTMLTidy, then attacking it with XPath. XPath rocks extremely hard. HTML::Tidy (there’s Andy Lester again) and XML::LibXML are excellent tools for this approach.
Tidy first? (Score:2)
Re:Tidy first? (Score:1)
I didn’t think of that because I actually use XSLT most of the time (nowadays a Perl wrapper script around XML::LibXSLT and the aforementioned modules), and there’s something really strange going on with namespaces in a DOM built using libxml’s HTML parser, which causes strange misbehaviour in XSL transforms that I never figured out (just had hours of debugging fun with). When I started out, I didn’t even have the option because I was in fact using libxslt’s
xsltprocutility, and that doesn’t even a way to parse HTML input.If you’re actually just parsing the input using libxml and then do all the work in Perl, you can probably get away without turning things into XHTML first, you’re right.
Reply to This
Parent