NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

## All the Perl that's Practical to Extract and Report

#### robin (1821)

robin
(email not shown publicly)

### Journal of robin (1821)

Tuesday April 09, 2002
08:59 AM

### More regex

[ #4054 ]
[Note: There is a full account of my recursive regex idea in this article.]

I've found the bug in my PCRE patch, which is partly to do with the way * repetitions are handled. But you don't actually need to use iterative repetitions any more, because you can replace iteration with recursion! /a*/ can be rewritten as /((a(?1))?)/. And if you do that, you sometimes avoid triggering the bug. So you can test for matching XML-style tags like this:

&#163;^(<\w+/>|<(\w+)>([^<>]|(?1)|)(?3)</\2>)\$&#163;

I'll fix the bug soon...

I've also managed to prove that all context-free languages can indeed be expressed. The proof takes the form of an algorithm for turning a context-free grammar into a regex:

• Eliminate left recursion from the grammar. (This is a standard procedure, but quite complicated.)
• Write the grammar as a system of equations. For example, the grammar
• S --> ''
• S --> '(' T ')' S
• T --> S
• T -> 'x' T

becomes

• S = '' + '(' T ')' S
• T = S + 'x' T
• Now work out the least solution for S in terms of a least fixpoint operator µ. In our example that is
• µS. ('' + '(' µT.(S+'x'T) ')' S )
• And translate the µ-expression into a regex:

/(|\(((?1)|x(?2))\)(?1))/

• (That regex doesn't actually work yet, because of various bugs. But it ought to.)

Of course, the interesting part is proving that the algorithm really works. I plan to write it up in more detail soon.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.