Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • See the parse_html_* [cpan.org] methods in LibXML.
    • I wasn't sufficiently clear in my original message. I was trying the parse_html_* methods in XML::LibXML and they were whining about broken HTML in the two pages I was playing with. So I said "screw it" and sent back to parsing those with HTML::* modules.

      --Nat

      • Doh. HTML parsers that can't parse broken HTML aren't that useful :)

        Have you tried HTML::TreeBuilder with Class::XPath [cpan.org]?
        • by gnat (29) on 2003.07.02 10:45 (#21674) Journal
          I haven't, but boy that's really cute. I was wondering the other day whether there were more general XPath modules available. You know, with a little optimization (the ability to search a tree once but have multiple possible XPath expressions and associated actions to run at each step), you could use XPath as the basis for your optimizer--write XPath expressions for the things to optimize.

          Ah yes, I've known about XPath for three days. Why wouldn't I assume I've had an original thought :-)

          --Nat