Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

avar (6604)

avar
  (email not shown publicly)

Journal of avar (6604)

Tuesday October 16, 2007
10:10 PM

The CL backend for the kp6 Perl 6 compiler making headway

The Common Lisp backend for KindaPerl6 has been making a lot of progress since I last wrote about it.

We now have working variables, subroutines, lexicals (and closures), methods and other nice stuff. The full report is on the pugs blog.

Saturday September 22, 2007
07:23 PM

Perl 6 to machine code via Common Lisp and sbcl

Aankhen, fglock and me have recently been working on a Common Lisp emitter for KindaPerl6. KindaPerl6 is a self-compiling compiler written in a subset of Perl 6 with support for multiple emitter backends. Up until now it has been running on the Perl 5 backend but now the Common Lisp backend is getting up to speed.

Hello world is about the only thing the CL backend is currently compiling, but it's doing so quite impressively already. I've posted an update to a recent thread on pugs.blogs.com that shows how fast Perl 6 to machine code via sbcl is doing compared to the kp6 perl5 backend and the parrot nqp compiler.

Common Lisp has other implementations you could do some interesting stuff with, for instance you could run the same program under movitz "on the metal". If someone wanted to implement an operating system in Perl now would be a good time to start:)

Friday August 10, 2007
05:58 PM

split // will be around 3x as fast in 5.10

After a recent patch of mine to blead split // is around three times as faster than it was, it's even faster than unpack! I'll be giving a lightning talk on this (click the "slides/index.html" link) at YAPC::EU 2007 if the conference organizers will let me.
Friday June 22, 2007
11:49 PM

Partly-compatable regular expressions

Since I finished my changes to what will become the pluggable regex
API in perl 5.10 I've been working on writing re::engines again, first
on Plan 9 and then on finishing PCRE which audrey and yves started
(but didn't quite finish).

I based the new PCRE wrapper on the Plan 9 and upgraded the underlying
PCRE library to 7.2, and aside from a bug in how split is handled it
works for most of the cases where the Perl engine does.

Having wrapped PCRE running Perl's own regex tests under PCRE becomes
really easy. There are almost 1300 test for the regex syntax in
t/op/regex.t in perl core. Running these under re::engine::PCRE
reveals the following incompatibilities (and some bugs) between it and
Perl:

(?{}) and (??{}) tests fail (obviously). Getting at least (?{}) to
work might be possible with pcre's callout mechanims but I haven't
looked closely at that.

A few tests such as "bbbbXcXaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" =~
/.X(.+)+X/ fail because PCRE recurses away and runs into its internal
MATCH_LIMIT while recursing. Using pcre_dfa_exec() instead of
pcre_exec() yields a match but the DFA routine has different matching
semantics.

One test fails because PCRE treats [a-[:digit:]] as an invalid range
while Perl takes it to mean a character class matching 'a', '-' or a
[:digit:]. Perl is probably being too permissive in this case.

"aba" =~ /^(a(b)?)+$/; say "$1-$2" will yield "a-b" under PCRE but
"a-" under Perl. That is, PCRE eats the inner (b) while perl goes with
the outer +. Both match the entire string.

Perl accepts curly modifiers on (?!) e.g. /foo(?!bar){2}/ but PCRE
doesn't. I couldn't get Perl to do anything useful with that
though. /foo(?!bar{2})/ works in both engines and doesn't match "foo"
followed by "barbar".

PCRE does not match <<!>!>!>><>>!>!>!> against
^(<(?:[^<>]+|(?3)|(?1))*>)()(!>!>!>)$ but Perl does. I haven't looked
into why.

PCRE does not support (*FAIL) and (*F) which cause the pattern to
fail, nor does it support (*ACCEPT).

Three tests try to match \x{85} against \R in an UTF-8 upgraded string
("\305\205") in a pattern that wasn't compiled with PCRE_UTF8. This
isn't a PCRE issue but an API usage problem in re::engine::PCRE, the
best solution is probably to upgrade all patterns and strings to
UTF-8 before calling pcre_compile/pcre_exec.

PCRE accepts numeric keynames such as ^(?'0'ook)$, ^(?<0>ook)$,
^(?<1a>ook)$. These all match the literal string "ook" and set up the
named capture "0" or "1a". Perl does not currently accept named
buffers that start with a number.

re::engine::PCRE doesn't support multiple named match buffers under
the same name while Perl does. At first I thought this was a PCRE
limitation but it turns out that I just didn't know about the
PCRE_DUPNAMES option:)

So aside from inline eval re::engine::PCRE is pretty much a drop-in
replacement for Perl's engine. And perhaps more significantly PCRE's
compatability can now be tested (and errors fixed) by running it
against Perl's own test suite.

To run Perl's tests on PCRE get blead and re::engine::PCRE 0.10
(coming to a CPAN near you), build it and run:

    perl5.9.5 -Mblib t/perl/regexp.t

By default it skips the failing tests, these can currently be enabled
by commenting out line 86 in regexp.t:

    @pcre_fail{@pcre_fail} = ();