NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
XPath (Score:3, Informative)
document()function to retrieve a document by name), this little tidbit finds all of the links containing.htmlin the href, fetches them, parses them, and returns the title of each page.A spider. In one expression.
Assign that to a nodeset and reapply the expression, and you're going two levels out. (Or just nest the
Did you try munging the HTML withdocument()functions into something really contorted.)tidyfirst? That works a decent amount of the time. (You can have tidy emit XML/XHTML if you don't want to deal with HTML parsers.)Reply to This
Re:XPath (Score:2)
(Morbus, you getting this for Spidering Hacks? :-)
--Nat
Re:XPath (Score:1)
Re:XPath (Score:1)