Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • i am just going to do some scraping work and W::S works great so far. the doc is lacking though, the examples you posted in past journal helped! have few questions though:

    1. the example from the doc has:
      process "h3.ens>a",
      where the ens seems to be doing wildcard matching, any class name contains ens.
    2. html page contains utf8 characters such as è , that made HTML::Parser complain.
      Parsing of undecoded UTF-8 will give garbage when decoding entities
      HTML::Parser mentioned encoding the data
    • 1. If you want a wildcard matching you can change the selector expression to something like ".ens>a" 2. Web::Scraper does whatever it can do to decode utf-8 characters back to Unicode as possible, as long as you pass the URI object and the HTML page has a correct Content-Type header. Otherwise you need to fetch the page into a variable and call Encode::decode to get the Unicode character back. 3. result keyword can specify which stash variable you want to get as a result. You can omit it if you want th