Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • If that's possible, I would be totally happy to include CSS selectors in HTML::TreeBuilder::XPath (and actually even in XML::XPathEngine). I would love the module to auto-detect which query language is used, but I don't think that's possible, as the syntax overlap.

    --
    mirod
    • Where does CSS::SAC fit into this discussion? Thanks, Christopher
      • Hm, I haven't looked at CSS::SAC. Looks like it's a SAX parser for CSS? My code does use CSS Selector as just a replacdement of XPath and the code can probably make use of CSS Selector Parser to be complete.
      • Uh, I used Google Code Search to find the probably duplicated work done in CSS::SAC [google.com], in January 2005.

        Looks like CSS::SAC on CPAN is not updated for a long time (the last update is September 2004) and it's not a bad thing to have a separate, pure perl (and independent of any CPAN module) would not be a bad thing, though.
        • Indeed, I just thought I'd point it out as I have been looking for something in perl as good as ScrAPI as I don't have the cycles to write one and haven't yet (with CSS::SAC) the closest. However if we can build something better I am happy. :-) Christopher
    • That's totally possible with just a few lines of code, and yeah, auto-detecting selectors from xpath would be impossible. I'm not sure including the feature into H::TB::XPath is the right thing to do. Maybe it is.
      • I hadn't looked at this at all, but I see that your HTML::Selector::XPath is indeed most of what's needed. Nice job.

        I have to thing about it, but at the very least I will add something in the docs about using HTML::Selector::XPath in order to use CSS selectors on XML/HTML modules.

        --
        mirod
  • This is very nice! Thanks!

    I've been scraping HTML for a while (since sitescooper), and XPath is definitely the right way to do it, I think.