Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • XPath (Score:3, Informative)

    XPath is incredible.

    Yep. Wrap your brain around this hack:

    document(//a/@href[contains(., '.html')])/html/head/title

    In the context of an XSLT stylesheet (or something that provides the document() function to retrieve a document by name), this little tidbit finds all of the links containing .html in the href, fetches them, parses them, and returns the title of each page.

    A spider. In one expression.

    Assign that to a nodeset and reapply the expression, and you're going two levels out. (Or just nest