Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

demerphq (2831)

demerphq
  (email not shown publicly)
http://www.perlm ... l?node_id=108447

Perlmonk. Perl5 Regex Hacker. Telecoms Billing Specialist. Canadian living in Germany.

Journal of demerphq (2831)

Sunday December 03, 2006
11:43 AM

Perl5 Regex Engine Abstracted - Win A Metre of Beer!

[ #31796 ]

With patch Change #29430 the regex engine should now be sufficiently abstracted that it is reasonably feasable to write a plugin to use a different regex engine in perl. The documentation for most of the interface is contained in the perlreguts module although one needs to understand the flags in the regexp.h in order to make it work.

The basic idea is that the existing perl engine and data structures have been ripped in half. The structures which the perl core must interact are now well defined and the parts that are "private" to a given regex engine implementation are now isolated from the core.

A big issue with Perl 5.8 and earlier Perls regex engine pluggability interface was that it was intended mostly for swapping in a DEBUG build of the real engine, with the expectation that any engine in use could cope with the data structures generated by any other engine. Additionally some of the management routines for regexps were harded coded into the core, with the expectation that the core routine could handle any engines data. All of these design issues meant that basically you couldnt plug an arbitrary engine into perl. You would have to apply core patches to do even the meanest implementation.

In the new scheme what Perl needs to know about a pattern, is defined by the struct regexp. What Perl needs to know about a regex engine is defined by the struct regexp_engine. All a plug in needs to do is create a regexp_engine that contains the appropriate callbacks and populate %^H appropriatly when the module is use()d. Perl will then use the compiler routine specified by the engine to create a regexp struct which it will then use as though it was its own. The regexp structure contains a pointer to the regexp_engine structure which created it, so the regexp is completely encapsulated. You can compile a regexp using one engine and pass it through to a routine expecting a normal regexp and everything would work out.

So now all we need to do is get some intrepid hacker to actually put all this to good use and write a plug in for another engine.

And you know what?

The first person that publishes a proper regexp engine plug in for BleadPerl (soon to be Perl 5.10) will get a METRE OF BEER for doing so....

Ill give a bonus prize (to be determined) if the engine ported is the latest TCL engine.

So lets see how long it takes... :-)

Note: I'm going to publish this challenge elsewhere over the next few days. Thanks to AdamK for the idea of the prize.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • From #p5p:

    17:21 <@audreyt> I need to sleep.
    17:21 <@audreyt> and here is my entry:
    17:21 <@audreyt> http://perlcabal.org/~audreyt/tmp/re-engine-y2k-0.01.tar.gz
    17:21 <@audreyt> enjoy :)
    17:21 <@dmq> i think you win. :-)
    • Ha! Hilarious. Ok, so you win. But I guess I should have been more specific about what I meant.

      So to clarify, to qualify for a prize you have to implement a real functional NON-PERL (language or core) regexp engine as a plug in.

      Meaning, PCRE, the TCL 4 engine, PGE....

      So the show ain't over folks! Theres still beer to be won.

      • Hi, I wonder what are the other dimensions to that metre...are they

        * of capillary size?

        * aproximately of a size corresponding to the cross-section of a monster bavarian mug? ...

        * not specified? ... which will make us ...well very drunk...(and AdamK poorer) Actually in Madrid where I live more and more people use m when then mean m**2 especially when they talk about appartments)

        I remember seeing mentioned in Jeffreys F. book that the latest engine of Henry Spencer (possibly used in >tcl8.4) w

        • Funny that you assume AdamK is paying for this metre of beer, too. That was only last time.
        • Is there a special reason why you mention the tcl libraries, besides of looking for a proof of concept for your latest work.

          Because of the hoo-rahing that it got from Friedl. And because to the best of my knoweldge there are only a few C regexp engines up to the job, and the Tcl engine represents the one I know of that is furthest from what Perl itself does.

          With regards to the rest, either I dont know, or we will see later. :-)

        • And this folks, is why you need to be VERY specific in challenges for prizes.

          No I'm not paying for this one :)

          But my standard was to purchase crates/cartons/slabs of "your favourite non-rediculously-expensive beer" (as in whatever you like, but don't screw me over) :) and pile those crates up until they were a metre in height.

          Of course, in retrospect I should have said "at least" or "no more than" :) So for stennie's CamelPack prize we approximated by going low and throwing in a six-pack.

          But trust me, it's
  • It's on CPAN now as re::engine::PCRE.

    You can also download it from http://perlcabal.org/~audreyt/tmp/re-engine-PCRE-0.01.tar.gz [perlcabal.org].

    Enjoy!

    • Well, it looks like that one did it.

      Congratulations Audrey!

      Now it needs more tests and some loving care to go from proof of concept to fully usable but I think we have proved the framework is not crippled from the get go.

      Thanks a lot, it looks like this is a great leap forward.