Ruby library scrAPI looks promising. It allows you to write scraper code using CSS selector, like:
scra per = Scraper.define do
process 'span.title > a:first-child',
:title => :text, :url => '@href'
process 'ul.list-circle > li:first-child > a',
:category => :text
:title, :url, :category
html = open(url).read
In Plagger's EntryFullText module and alike, we use regular experssion and/or XPath to extract these kinds of information, and i think adding CSS selector would be neat too.
Are there already perl module to do the similar things on CPAN? I searched for it but couldn't find any. CSS.pm doesn't do such things.