Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

TorgoX (1933)

TorgoX
  sburkeNO@SPAMcpan.org
http://search.cpan.org/~sburke/

"Il est beau comme la retractilité des serres des oiseaux rapaces [...] et surtout, comme la rencontre fortuite sur une table de dissection d'une machine à coudre et d'un parapluie !" -- Lautréamont

Journal of TorgoX (1933)

Tuesday December 07, 2004
09:08 PM

Scr.*?nscraping

[ #22199 ]
Dear Log,

So when I go to write another HTML-scraper like this, I often start by copying a block of the HTML that I want to capture repetitions of up and down the template-generated page, and I paste it into the STDIN of this little utility. I hit return and control-D, and then out comes a big dumb regexp that loosely matches that piece of input. I take out the bits of text that I know will vary, and I replace them with (.*?) or the like, and voilà, screenscraper.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • This one is definitely going in the toolbox. Thanks!

    -sam

  • s/([\.\(\)\^\$\@\[\]\*\?\+\{\}\#\\])/\\$1/g;

    Is there a difference from $_ = quotemeta $_;?

    • I think quotemeta is a bit more verbose. Like it seems to quote everything that's not \w.
      • Ah, yes. Actually I just remembered that quotemeta is in fact problematic for this purpose because backslashes are not treated the same in regex quoting context vs double-quote quoting context. Which annoyed me when I was trying to interpolate user data in a the s/// pattern in a string to be evaled.