Ruby library scrAPI looks promising. It allows you to write scraper code using CSS selector, like:
scra per = Scraper.define do
process 'span.title > a:first-child',:title => :text, :url => '@href'
process 'ul.list-circle > li:first-child > a',:category => :text
result:title, :url, :category
end
html = open(url).read
scraper.scrape(html)
In Plagger's EntryFullText module and alike, we use regular experssion and/or XPath to extract these kinds of information, and i think adding CSS selector would be neat too.
Are there already perl module to do the similar things on CPAN? I searched for it but couldn't find any. CSS.pm doesn't do such things.
We could wirte a CSS - XPath transator (Score:1)
Re: (Score:1)
Re: (Score:1)
Yes.
Re: (Score:1)
Googling "CSS selector to XPath" gives me pretty few results:
http://groups.google.com/group/behaviour/browse_thread/thread/246782199cea5ce9/
http://www.joehewitt.com/blog/2006-03-20.php [joehewitt.com]
Re: (Score:1)
It should not be very hard. There are not many selectors in CSS2 and they just need to be translated once. Maybe I should write up the equivalents.
Re: (Score:1)
Here you go: How to map CSS selectors to XPath queries [plasmasturm.org].
Re: (Score:1)
What about CSS 3 Selectors (Pseudo classes)? [w3.org] Looks like html/selector.rb implements some of those, e.g.
Porting html/selector.rb to perl (Score:1)
Re: (Score:2)