Thursday November 22, 2007
09:05 PM
scraping sibling nodes by Web::Scraper.
Web::Scraper is not good at some case. likes follow...
<div class="author">miyagawa</div>
<div class="module">Web::Scraper</div>
<div class="author">hanekomu</div>
<div class="module">Dist-Joseki</div>
This is not a tree structure.. hmm... Web::Scraper dependes on the tree structure, isn't it?
but, XPath is swiss army chainsaw.
scraper {
process '//div[@class="author"]', 'modules[]', scraper {
process '/.', 'author', 'TEXT';
process '/following-sibling::div[1][@class="module"]', 'title', 'TEXT';
}
};
but, this code is doesn't works.scraper cannot support this way.
If Web::Scraper supports this feature, you can be scraping from 'search.cpan.org', 'blog.livedoor.com', or many web sites more easily.
follow is the dirty and quick patch for this problem.
http://limilic.com/entry/c3qpikckc7f12jq3
scraping sibling nodes by Web::Scraper. 0 Comments More | Login | Reply /