Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

tokuhirom (7396)

tokuhirom
  (email not shown publicly)
http://d.hatena.ne.jp/tokuhirom/

Journal of tokuhirom (7396)

Thursday November 22, 2007
09:05 PM

scraping sibling nodes by Web::Scraper.

[ #34958 ]
Web::Scraper is not good at some case. likes follow...

  <div class="author">miyagawa</div>
  <div class="module">Web::Scraper</div>
  <div class="author">hanekomu</div>
  <div class="module">Dist-Joseki</div>

This is not a tree structure.. hmm... Web::Scraper dependes on the tree structure, isn't it?

but, XPath is swiss army chainsaw.

  scraper {
    process '//div[@class="author"]', 'modules[]', scraper {
      process '/.', 'author', 'TEXT';
      process '/following-sibling::div[1][@class="module"]', 'title',  'TEXT';
    }
  };

but, this code is doesn't works.scraper cannot support this way.

If Web::Scraper supports this feature, you can be scraping from 'search.cpan.org', 'blog.livedoor.com', or many web sites more easily.

follow is the dirty and quick patch for this problem.
http://limilic.com/entry/c3qpikckc7f12jq3
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.