Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Thursday September 09, 2004
02:29 PM

The Power of Ruby's Regex Engine

[ #20800 ]

At last night's Perl Mongers meeting, we again had some Ruby folk around to show us a bit of their language of choice. One thing I found interesting was their regex engine. Apparently it's reentrant. They have the same regex variables we do but those are merely copies of the relevant data. They are apparently not used to determine the state of the engine. Instead, each regex is assigned its own "match" object. As a result, a regex can embed a code block which in turn calls more regexes without blowing up. You can't do this in Perl.

As a result, I've been toying with the idea of using Inline::Ruby because one of my pet projects could benefit from this (yes Adrian, I hear you telling me to use the call stack :)

Wouldn't it be a bit ironic that I'd be forced to use Inline::Ruby to take advantage of the power of Ruby's regular expressions?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • If not for the kleptomaniac nature of the Perl language we wouldn't have all those nifty features to play with. Including regular expressions.

    And the world rightfully became impressed with the Perl regexp engine. Good for them.

    Why shouldn't we steal back recursive regexps from Ruby? (modulo implementation difficulties)
  • It's also one of the slower engines, and at least in Ruby 1.6 its \G doesn't prohibit regex bump-along (it's "start of current match" rather than "end of last match"), which makes relatively useless to write complex parsers with.

    Personally, I'm waiting for Inline::Perl6 ;-)

    • I didn't know about the \G issue, but the slowness doesn't phase me for one simple reason: slow but working versus fast but broken is a sure win in my book. :)

      • Of course! If you need recursive regexen, fast but broken is obviously useless. Just don't disregard that you can talk about working vs broken only with regard to recursion. For me, Ruby's engine is similarly useless because of its \G behaviour as Perl's engine is for you because of non-reentrancy.

        Tool for the job and all that I guess.. :-)

    • I'm curious about this. Do you have an example that demonstrates this? And does it behave the same in Ruby 1.8?
      • When trying to match abcde with /\Gx?/g, the first match is successful, because no x is found but the question mark allows zero characters to be consumed. This match ends after zero characters into the string at start-of-string. In order to avoid infinite loops on a zero-length matches, the engine then retries the match one position down the string.

        In Perl, \G means end-of-last-match, and since end-of-last-match was at start-of-string, \G can't possibly match at one character into the string:

        $ p