NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
XPath (Score:3, Informative)
document()function to retrieve a document by name), this little tidbit finds all of the links containing.htmlin the href, fetches them, parses them, and returns the title of each page.A spider. In one expression.
Assign that to a nodeset and reapply the expression, and you're going two levels out. (Or just nest the
Did you try munging the HTML withdocument()functions into something really contorted.)tidyfirst? That works a decent amount of the time. (You can have tidy emit XML/XHTML if you don't want to deal with HTML parsers.)Reply to This
Re:XPath (Score:2)
(Morbus, you getting this for Spidering Hacks? :-)
--Nat
Re:XPath (Score:1)
Re:XPath (Score:1)
Perl XPath functions (Score:1)
Don't forget to talk about XML::LibXSLT's ability to write and register XPath extension functions written in Perl.
Of course 1.53 has memory bugs, but if you get Matt's CVS copy, you can have Perl callbacks from XSLT. This is incredibly useful; say you want access Apache req objects from XSLT, using closures, in a handler().
$xslt->register_function($urn, 'get_request', sub { &get_request($self,@_) } );
Write get_request() to handle arguments to an XPath function (which can b
Re:Perl XPath functions (Score:2)
--Nat
LibXML and HTML (Score:2)
Re:LibXML and HTML (Score:2)
--Nat
Re:LibXML and HTML (Score:2)
Have you tried HTML::TreeBuilder with Class::XPath [cpan.org]?
Re:LibXML and HTML (Score:2)
Ah yes, I've known about XPath for three days. Why wouldn't I assume I've had an original thought :-)
--Nat
Re:LibXML and HTML (Score:2)
Re:LibXML and HTML (Score:2)
Thanks!
--Nat