Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

demerphq (2831)

  (email not shown publicly)
http://www.perlm ... l?node_id=108447

Perlmonk. Perl5 Regex Hacker. Telecoms Billing Specialist. Canadian living in Germany.

Journal of demerphq (2831)

Thursday September 14, 2006
01:06 PM

Regex hacking

[ #30996 ]

I recently left my job and have been decompressing from a long period of continuous employment by enjoying some time off with some nice weather, bike riding, late mornings, breakfast at the cafe and lots of hacking.

I've managed to close off most of my regex engine todo list: ANYOF and jump tries, aho-corasick startclass matching, postitive-look(ahead|behind) optimisation (theres some rough edges to deal with on this one yet), charname support in the parser, and about 75% of what needs to happen to the debug output to make it "non-regex engine hacker" friendly.

All this combined with my earlier efforts (single-char-ANYOF to EXACT, simple-TRIE's) mean that perl now comfortably outperforms python in the so-called "rebench" tests. (bleadperl average time per test: 28 usecs, python average time per test: 297 usecs) The use of regex preprocessors to do "trie" like optimisation will no longer be necessary, and in fact will often slow things down, as it will result in duplicating things that are already happening internally.

I need to revisit perlreguts and update it with what Ive learned, and in some cases what has changed.

I need to update my journal more often. :-(

Things still to do: Clean up/reorganize re Debug stuff to be easier to use. Look into using reghop4() for things. Look into the MAGIC hack for making things pluggable. Look into migrating code in sv.c re_dup over to regcomp.c Maybe split regcomp into several pieces to make it smaller and easier to manage.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I've been meaning to add a review of perlguts to perltodo.pod. There have been quite a few refactorings, new bits of API, and new macros meant to make life easier for Perl 5.10. This is something actual that a lot of people need to look into and make suggestions on.
  • There really must be some award for the incredible work you've done on the regexp engine. For a long time it was feared that the only person left who could hack on it was Ilya Zakharevich (well, that was my fear) and you've really taken up the banner and pushed perl's regexp implementation into fantastic new territories. I can't wait for perl 5.10 now!
    • You know its comments like this one that really get me motivated. Often in the p5p circles its hard to tell how ones work is received. Everybody is so experienced (and in some ways jaded....) that they tend not to give to feedback like this. So hearing you say this really warms my heart. Thank you, and I'm pleased that you like it.

      • I'm late to this journal comment -- but I agree with Matt. This is really cool code, and I hope we can use it effectively in Spamassassin; I'm certainly trying ;)
      • I wonder if they're jealous of the time and energy you have to do such work. I certainly am, but I look forward to using it.

  • That sounds pretty awesome. How doe PCRE compare?