Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

robin (1821)

robin
  (email not shown publicly)
about:blank

Journal of robin (1821)

Monday January 16, 2006
06:46 AM

Does everything come back to Perl?

This morning I got the 'Pataphysics CD in the post, from the Sonic Arts Network. There's some good stuff on there, including some of Luc Etienne's phonetic palindromes, and Nigey Lennon's strange homage to Alfred Jarry, The Man with the Axe.

Idly googling for merdre , what should I see on the first page of results but this. Though, on reflection, maybe it's not so strange after all. Perl is surely one of the more 'pataphysical programming languages.

Thursday June 30, 2005
06:48 PM

Hacking on Want

It's been a while since I've done any serious hacking on anything Perl-related. Yesterday I woke up to a message from Damian Conway, reporting a subtle bug in my Want module. I haven't been very good at responding to bug reports of late (most of them are EBKAC or known bugs), but I think Damian has earnt the right to be taken seriously.

It took me most of the day to track down the bug and fix it, but it was an interesting journey. What he found is that, if you call want() from (a sub that's called from) within the guard of a loop, it crashes the second time through.

It turns out that this happened because of a subtle design flaw in Want. Perl doesn't really have any proper introspection capabilities, so modules like Want have to be cunning and take advantage of data that's around for other reasons. To decide what context a sub is called in, Want locates the part of the optree where the sub is called, and then trawls it to find the essence of the expression the sub call is in. (For example foo() + 2 means foo is called in numeric context, whereas foo() && 2 means it's called in boolean context.

There's no easy way (that I know of) to find the right part of the optree, but there are various bits of information around that give enough of a clue. The activation record for a sub records the last statement that was executed before the sub call, and the address the sub should return to. So I walk the optree, starting at the last statement, until I find the return address; then I know where the sub must have been called from.

The second time through a loop, however, it can happen that the last statement executed is after the return point, so it keeps walking and walking but never finds what it's looking for.

It took me a while to see how to fix it, but in the end I found a way. It so happens that loops, as well as subroutines, leave an activation record on the context stack, so the new code does this: after it's found the activation record for the sub, it keeps looking up the stack to see if there's a loop around the sub call. If there is, the optree walk starts at the beginning of the loop instead. That seems to fix it.

I'm just waiting for Damian to give the all clear before I release the new version.

Saturday May 25, 2002
09:57 AM

You know you're starting to think in O'Caml when...

You write Perl code like this:

sub gron {
  my ($f, $total, $width) = @_;
  my $veet;
  $veet = sub {
    my ($partial, $subtotal, $n) = @_;
    my $rem = $total - $subtotal;
    if ($n+1 == $width) {
      $f->($rem, @$partial);
    }
    else {
      $veet->([$_, @$partial], $subtotal+$_, $n+1) for 0..$rem;
    }
  };
  $veet->([], 0, 0);
}

gron(sub {print($_ ? $_ : " ") for @_; print "\n"}, 3, 27);

I'm fairly sure that the idea of using a recursive closure in Perl has never crossed my mind before. Notice the disguised conses as well :-)

Thursday May 23, 2002
09:50 AM

Source filtering, the O'Caml way

Recently I've been playing with O'Caml, which is a charming language.

It has a source filtering mechanism called camlp4. At the heart of it is an extensible replacement parser, which makes it almost trivial to change or extend the language. One of the examples in the manual adds a new loop construct in six lines of code.

Of course, camlp4 itself is written not in ordinary O'Caml but in the "revised" (formerly "righteous") syntax invented by the author of camlp4.

It's interesting that several of the "big" changes planned for Perl 6 are already features of O'Caml: extensible syntax, currying, stable multithreading.

Oh, and it's (conceivably) faster than C++.

Saturday April 20, 2002
07:52 AM

More recursion

I've rewritten my recursive regex implementation, and I think it actually works properly at last.

I have started wondering about the feasibility of replacing perl's regex engine with PCRE. The regex engine is supposedly pluggable already, but it looks as though plugging in a completely different regex engine would still be non-trivial. Any thoughts?

Friday April 12, 2002
10:18 AM

OS X filename oddity

Filenames in Darwin are UTF-8, so I wondered what would happen if you use a nonsensical sequence. It seems that you can make weird disappearing files, which show up in the Finder when you first open the folder they're in, and then quickly disappear. Sometimes they don't actually disappear until you try and click on them, which is tantalising.

If you have an OS X machine try this: perl -e 'mkdir("foo\xED\xA0\x80bar") or die $!'

07:18 AM

Ccard

There is a card game based on category theory.
Tuesday April 09, 2002
01:32 PM

Other Cam(e)l book?

Apparently O'Reilly is bringing out a book on OCaml. I wonder what animal they'll use for the cover...
08:59 AM

More regex

[Note: There is a full account of my recursive regex idea in this article.]

I've found the bug in my PCRE patch, which is partly to do with the way * repetitions are handled. But you don't actually need to use iterative repetitions any more, because you can replace iteration with recursion! /a*/ can be rewritten as /((a(?1))?)/. And if you do that, you sometimes avoid triggering the bug. So you can test for matching XML-style tags like this:

&#163;^(<\w+/>|<(\w+)>([^<>]|(?1)|)(?3)</\2>)$&#163;

I'll fix the bug soon...

I've also managed to prove that all context-free languages can indeed be expressed. The proof takes the form of an algorithm for turning a context-free grammar into a regex:

  • Eliminate left recursion from the grammar. (This is a standard procedure, but quite complicated.)
  • Write the grammar as a system of equations. For example, the grammar
    • S --> ''
    • S --> '(' T ')' S
    • T --> S
    • T -> 'x' T

    becomes

    • S = '' + '(' T ')' S
    • T = S + 'x' T
  • Now work out the least solution for S in terms of a least fixpoint operator µ. In our example that is
    • µS. ('' + '(' µT.(S+'x'T) ')' S )
  • And translate the µ-expression into a regex:

    /(|\(((?1)|x(?2))\)(?1))/

  • (That regex doesn't actually work yet, because of various bugs. But it ought to.)

Of course, the interesting part is proving that the algorithm really works. I plan to write it up in more detail soon.

Monday April 01, 2002
05:01 PM

Regex power!

I've been hacking on PCRE. I love coding in C, it's so clean and crisp and quick.

I've added an interesting extension to the syntax. Would this be a good idea for Perl?

/^\W*(?:((.)\W*(?1)\W*\2|)|((.)\W*(?3)\W*\4|\W*.\W*))\W*$/i