Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • It's also one of the slower engines, and at least in Ruby 1.6 its \G doesn't prohibit regex bump-along (it's "start of current match" rather than "end of last match"), which makes relatively useless to write complex parsers with.

    Personally, I'm waiting for Inline::Perl6 ;-)

    • I'm curious about this. Do you have an example that demonstrates this? And does it behave the same in Ruby 1.8?
      • When trying to match abcde with /\Gx?/g, the first match is successful, because no x is found but the question mark allows zero characters to be consumed. This match ends after zero characters into the string — at start-of-string. In order to avoid infinite loops on a zero-length matches, the engine then retries the match one position down the string.

        In Perl, \G means end-of-last-match, and since end-of-last-match was at start-of-string, \G can't possibly match at one character into the string:

        $ perl -le'$_="abcde"; s/\Gx?/!/; print'
        !abcde

        In Ruby (both 1.6 and 1.8, I found), \G merely means start-of-current-match, which, of course, is satisfiable at that point:

        $ ruby1.6 -e'puts "abcde".gsub(/\Gx?/,"!")'
        !a!b!c!d!e!
        $ ruby1.8 -e'puts "abcde".gsub(/\Gx?/,"!")'
        !a!b!c!d!e!

        Perl's \G is a powerful tool to write parsers because the regex engine is prohibited from skipping characters to find a match — you can work your way through a string with a multitude of patterns using /c (to avoid resetting the end-of-last-match on match failure) applied against the same string in turn, without them sabotaging each other.