Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Are you asking about XML::LibXML::Node []’s nodePath method?

    • More or less what I was going to suggest.

      Keep in mind that 'nodePath' will return something like:


      Which, while correct, might not be the most flexible specification... maybe you really wanted:

      /html/body/div[h2='The table']/table/tr[td[1]='this row']/td[position()=../../tr[1]/td[.='this column']/position()]
      • Exactly. That's what I don't like with Mozilla extension way too.

        I might want the module to generate multiple possible XPath expressions so that the user can pick, to generate the scraper thing that's most reliable.
        • You’ll run into combinatorial explosion for even a relatively short path. There are extremely many ways to address a single element.

          I guess what you want, given your comparison with Template::Extract, is a way to accept multiple nodes and then ask for the strictest possible XPath expression (including shared attribute values on any ancestral elements etc) that matches them all.

          Hmm, that would be cool.

    • Yeah, this is quite similar to what I have in mind, except it's libxml based (I want one for HTML::Tree for some reason). But it'll be definitely helpful. Thank you!