Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

robin (1821)

  (email not shown publicly)

Journal of robin (1821)

Tuesday April 09, 2002
08:59 AM

More regex

[ #4054 ]
[Note: There is a full account of my recursive regex idea in this article.]

I've found the bug in my PCRE patch, which is partly to do with the way * repetitions are handled. But you don't actually need to use iterative repetitions any more, because you can replace iteration with recursion! /a*/ can be rewritten as /((a(?1))?)/. And if you do that, you sometimes avoid triggering the bug. So you can test for matching XML-style tags like this:


I'll fix the bug soon...

I've also managed to prove that all context-free languages can indeed be expressed. The proof takes the form of an algorithm for turning a context-free grammar into a regex:

  • Eliminate left recursion from the grammar. (This is a standard procedure, but quite complicated.)
  • Write the grammar as a system of equations. For example, the grammar
    • S --> ''
    • S --> '(' T ')' S
    • T --> S
    • T -> 'x' T


    • S = '' + '(' T ')' S
    • T = S + 'x' T
  • Now work out the least solution for S in terms of a least fixpoint operator µ. In our example that is
    • µS. ('' + '(' µT.(S+'x'T) ')' S )
  • And translate the µ-expression into a regex:


  • (That regex doesn't actually work yet, because of various bugs. But it ought to.)

Of course, the interesting part is proving that the algorithm really works. I plan to write it up in more detail soon.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.