I'm working on a project of a certain vintage, and of a certain age, that uses upwards of five programming languages to get stuff done. Annoying, but nowhere near uncommon. (There's a story about how JScheme is included in the JDK sources, because the code to generate CORBA classes for Java are written in Scheme...)
Luckily for me, I had a feature to add that traces through most of those languages all at once: Haskell -> Tcl -> XSLT -> Tcl. (The Perly bits form the backend system, not the frontend runtime components.) Thankfully, it was a simple fix: add wordbreak barriers around a regex being output from a Haskell program and sent upstream to heaven knows where.
Should be simple, right? Just replace "..." with "\\b(...)\\b", or some variant thereof. Easy peasy.
Except that the \b metacharacter is Perl syntax, and the regex isn't going to be processed by Perl. At one point, I though that this regex was going to be processed by a component written in C, using the GNU Regex library. Turns out that Perl, GNU Regex and PCRE all agree that \b is a word boundary. (POSIX regexes don't appear to know what a word boundary is...)
Yet none of the standard regex magic was working. Tracing through the code, I discovered that Tcl's regex engine was the one being used (by way of XSLT; about as convenient as a direct flight from Sydney to New York by way of Mars).
Looking over Tcl's regex docs, it turns out that \b is a backspace character!
Because matching backspace characters is such a common operation within a regex, Tcl preserves the C-style escape for \b, and uses \y for word boundaries.
WHAT ON EARTH WERE THEY THINKING?
From the MRE book (Score:1)
So the Tcl people weren't wrong, just di
Prior art (Score:1)
So, Perl is the usurper, not TCL. That's not to say that Perl is wrong, of course.
Re:Prior art (Score:1)
Not wrong, just different. The problem is when we Perl folks try to hold up Perl as "the" regex standard and it is not. The MRE book explains that different language regex implementations are a wee bit different from each other.
Re:Prior art (Score:2)
The problem is that Perl's regex syntax is adopted as the gold standard whenever another language/library needs to beef up its regex handling. There's a very large common subset shared between Perl, PCRE, GNU Regex, and probably some Java library. In general, this is a good thing, because it means that regexes generally become normalized, at least for the common cases. There should be one (common) way to find word boundaries, but all bets are off on variable capture and executing c
perhaps necessary (Score:2)