Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

masak (6289)

  (email not shown publicly)

Been programming Perl since 2001. Found Perl 6 somewhere around 2004, and fell in love. Now developing November (a Perl 6 wiki), Druid (a Perl 6 board game), pls (a Perl 6 project installer), GGE (a regex engine), and Yapsi (a Perl 6 implementation). Heavy user of and irregular committer to Rakudo.

Journal of masak (6289)

Thursday July 15, 2010
11:13 AM

Phasers are a blast: FIRST and LAST

I started thinking about the FIRST and LAST phasers the other day, thanks to moritz++. My attention was on how to implement them in Yapsi, and my conclusions were mostly SIC, but they can be converted to Perl 6 for public view.

For those who haven't kept up with the latest Perl 6 terminology, "phasers" are what we call those all-caps blocks which fire at different phases during program execution. Perl 5's perldoc perlmod simply calls them "specially named code blocks", but in Perl 6 it's been decided to call them "phasers".

So much for phasers. What do the FIRST and LAST phasers do? They don't exist in Perl 5. S04 describes them thus:

FIRST {...}       at loop initialization time, before any ENTER
 LAST {...}       at loop termination time, after any LEAVE

(There's a NEXT phasers too, which I'm not going to tackle today. The ENTER and LEAVE phasers are what they sound like; they trigger at block entrance and exit, respectively.)

Here's some code using these.

my @a = 1, 2, 3;
for @a -> $item {
    FIRST { say "OH HAI" }
    say $item;
    LAST { say "LOL DONE" }

The code, when run, should print the following:


(At the time of writing, no Perl 6 implementation implements the FIRST and LAST phasers yet.)

The goal of this post is transforming the phasers into code using more primitive constructs, but which still produces the above results. Oh, and it should work not only in this case, but in general.

Here's a first attempt. (Phaser-ful code to the left, rewritten code to the right.) It doesn't work.

my @a = 1, 2, 3;              my @a = 1, 2, 3;
                              say "OH HAI";
for @a -> $item {             for @a -> $item {
    FIRST { say "OH HAI" }
    say $item;                    say $item;
    LAST { say "LOL DONE" }
}                             }
                              say "LOL DONE";

More exactly, it does produce the desired output, but it doesn't work in general; it fails when @a is empty:

my @a;                        my @a;
                              say "OH HAI";
for @a -> $item {             for @a -> $item {
    FIRST { say "OH HAI" }
    say $item;                    say $item;
    LAST { say "LOL DONE" }
}                             }
                              say "LOL DONE";

This code would still produce "OH HAI\nLOL DONE\n", which is wrong, because there is no first and last iteration for the empty @a array.

Ok, we say. No worries; a bit more ad hoc, but we can detect for emptiness. No problem.

my @a;                        my @a;
                              my $HAS_ELEMS = ?@a;
                              if $HAS_ELEMS {
                                  say "OH HAI";
for @a -> $item {             for @a -> $item {
    FIRST { say "OH HAI" }
    say $item;                    say $item;
    LAST { say "LOL DONE" }
}                             }
                              if $HAS_ELEMS {
                                  say "LOL DONE";

That works for an empty list, but it fails to work when the FIRST block accesses variables that only exist within the for loop:

my @a = 1, 2, 3;              my @a = 1, 2, 3;
                              my $HAS_ELEMS = ?@a;
                              if $HAS_ELEMS {
                                  $x # BZZT PARSE ERROR
for @a -> $item {
    my $x;
    FIRST { $x = 42 }
    say $item, $x;

So. Back to the drawing-board. Two seemingly opposing forces constrain our problem: we need to put the rewritten FIRST block outside the for loop, because we only want it to execute once; but we also need to put it inside the for loop, so that it can have access to the same lexical environment. Is there a compromise somewhere in there?

Yes. We put the FIRST block inside the for loop, but then we keep track of whether we've already executed it once, with a special variable hidden in the surrounding scope:

my @a = 1, 2, 3;              my @a = 1, 2, 3;
                              my $FIRST_PHASER_HAS_RUN = False;
for @a -> $item {             for @a -> $item {
    my $x;                        my $x;
                                  unless $FIRST_PHASER_HAS_RUN {
    FIRST { $x = 42 }                 $x = 42;
                                      $FIRST_PHASER_HAS_RUN = True;
    say $item, $x;                say $item, $x;
}                             }

Now it all works. This is the general way to make the FIRST behave according to spec. In the presence of several loops within the same block, one can re-use the same variable for all of the loops, just resetting it before each one. Explicitly setting to False even the first time is quite important, in case someone ever implements the goto statement.

With the LAST phaser, we encounter exactly the same dilemma as with the FIRST loop. The LAST phaser has to be both inside and outside the block; inside because it has to have access to the loop block's variables, and outside because... well, because in general one doesn't know which iteration was the last one until it has already run.

At one point I had the idea to put the LAST block at the end of the loop block, checking the loop condition just before the placement of the LAST block, possibly saving it somewhere so it doesn't have to be re-evaluated. But the sad truth there's no realistic way to evaluate the loop condition from within the loop block; what if the expression contains a variable which is shadowed by another variable inside the loop block? There's just no way to make that fly.

The whole situation with the LAST block really looks hopeless... until one remembers about closures:

my @a = 1, 2, 3;              my @a = 1, 2, 3;
                              my $LAST_PHASER;
                              my $LOOP_HAS_RUN = False;
for @a -> $item {             for @a -> $item {
    my $x = "LOL DONE";           my $x = "LOL DONE";
    LAST { say $x }               $LAST_PHASER = { say $x };
                                  $LOOP_HAS_RUN = True;
}                             }
                              if $LOOP_HAS_RUN {

So in every iteration, we save away a closure just in case that particular iteration turns out to be the last one. Then we execute the last value assigned to the closure, provided the loop ever run. Sneaky, huh?

So that works in the general case. Of course, a clever optimizer which can detect with certainty that the loop will run at least once and that neither phaser uses loop-specific lexicals is perfectly entitled to rewrite the FIRST and LAST phasers to our first attempt. But the above rewritings work in the general case.

In explaining this to a colleague, a case of possible confusion involving the FIRST phaser was uncovered:

for 1, 2, 3 {
    my $x = 42;
    FIRST { say $x }

One might perhaps expect this code to print "42\n", but in fact it prints "Any()". The reason is simple: whereas the lexical $x is reachable throughout the whole for loop, the assignment of 42 to it won't occur until after the FIRST block has executed. That's what FIRST blocks do, they execute first. Nevertheless, some people might expect assignments to be treated specially in some way, not counting as "real code" or whatever. But they are, and thus that's the result. In general, reading from freshly declared lexical variables in a FIRST block won't do you much good.

Lastly, there's this wording in S04:

FIRST, NEXT, and LAST are meaningful only within the lexical scope of a loop, and may occur only at the top level of such a loop block.

I read that as saying that these kinds of blocks should be illegal if they are found in a block which isn't a loop block. STD.pm6 doesn't enforce this yet; it probably should.

Sunday July 11, 2010
04:55 PM

Iterating your way to happiness with Perl 6

I thought I'd have an easy time today, just regurgitating what S04 says about "Loop statements". Perl 5 already got this part pretty right already. Actually, even C got it pretty right. So what new does Perl 6 have to offer? That's what this post is about.

So, nothing much has changed about the while and until loops that we know and love.

while EXPR { ... }
until EXPR { ... }

Then there's the kind of loop when you want to test the condition *after* the block has run, rather than before. In Perl 5, that looks like this:

do { ... } while EXPR;
do { ... } until EXPR;

This construct tends to cause fresh Perl programmers a lot of grief, since do isn't really a loop construct. There's some wording about this in perldoc perlsyn:

Note also that the loop control statements described later will NOT work in this construct, because modifiers don’t take loop labels. Sorry.

Perl 6 solves this by

  • recognizing and disallowing while and until after do blocks, and
  • introducing the repeat block.

So now you write it like this instead:

repeat { ... } while EXPR;
repeat { ... } until EXPR;

And you get two bonus features from this: first, since the while or until is mandatory, you can put it on its own line. Generally, closing line-ending curlies act like they have implicit semicolons after them in Perl 6, but here the parser is smart enough to expect a while or until, so it doesn't put one in.

Second, you're allowed to put the condition up front if you want:

repeat while EXPR { ... }
repeat until EXPR { ... }

Even though the condition is before the loop here, it'll still not be run until after each iteration, because it's a repeat loop, and they work like that.

Then there's the loop construct that loops forever, aptly named loop:

loop { ... }

In C, we'd have that as for (;;) { ... }. And, symmetrically, you can also write it like this in Perl 6:

loop (;;) { ... }

Or, more generally, you can do any C-style for loop, if you just spell it loop:

loop (EXPR; EXPR; EXPR) { ... }

And what was known alternately in Perl 5 as for and foreach becomes just for in Perl 6.

The syntax for for in Perl 6 is what you'd expect:

for EXPR { ... }

But it packs a lot more power underneath. Or rather, the whole language is geared towards packing for with a lot more power. Some examples:

  • If you want to name the item you're currently looping over, just prefix the block with -> $a. (If you don't, you'll find the item in $_, as usual.)
  • If you want to loop two items at a time, prefix the block with -> $a, $b.
  • Neither of the above are special syntaxes belonging to the for construct as such; rather, the -> arrows belong to the block, making it, in the prevailing terminology, a "pointy block". In fact, all of the loop constructs I've brought up (except the C-style loop) can be given pointy blocks; the expression will then be bound to the variable and usable from within the block. You can do it with if statements, too! It's quite useful.
  • If you want to loop over two lists simultaneously, you can use the infix:<Z> operator to interleave the lists. Loop one item at a time, and you'll get alternating elements from the two lists. Loop two at a time, and $a (or whatever) will always be from the first list, and $b from the second. Gone are the days when you had to do manual trickery with indexes and stuff because the for loop wasn't powerful enough.
  • You can even eliminate nested loops sometimes with the infix:<X> operator, which does a Cartesian product (known in SQL circles as a "cross join") of two lists. So if you planned to do for @a { for @b { ... } } anyway, you might as well do for @a X @b { ... } and save yourself one level of indentation.
  • All of the above is lazy, so with a sensible Perl 6 implementation, there's no huge memory waste from building up all these big lists; it's all generated on-the-fly. (This, incidentally, means that we also read files with for loops in Perl 6, rather than with a magical while construct as in Perl 5.)

That's it for today. I forgot to mention the looping construct that only loops over one item... but you can look that one up yourself. Oh, and Perl 5.10 also has it.

Actually, I got to thinking about all this, since I figured out the other day how to do the FIRST and LAST phasers in Yapsi. But this blog post felt like a natural precursor to the one I wanted to write. Hopefully soon. 哈哈

Friday July 09, 2010
06:25 PM

Weeks 6 and 7 of GSoC work on Buf -- roundtrip

Warning: this blog post doesn't contain any puns at all. It's just a boring update about my progress. If you don't believe me, just go ahead and read it. I dare you.

Been working on file I/O and Bufs lately. It's tough work, but I'm now at a point where things run. Some highlights:

  • I asked on Parrot how to write binary data. Got enough help to get me started coding an IO.write method.
  • I realized that I didn't like at all how the IO spec described and IO.write. So I rewrote it.
  • Having got the IO.write method working, I wrote an method along the same lines. Here they are.

Now, these new methods obviously write and read bytes to and from files, but the tests indicate that things don't properly round-trip yet. Part of that is because the tests want the example string "föö" to encode as iso-8859-1, but there's no logic that does that yet. My slightly sleepy brain tells me there's more to the story though, or the string wouldn't come back garbled as "fö". (Interesting how that somehow feels like a small piece of mojibake, before the brain even takes in the individual characters.)

Another nice highlight was my question about how Bufs should stringify, which TimToady just answered on #perl6. That should be a smop to implement.

Saturday July 03, 2010
08:28 PM

Dreaming in mixins

Working with pls (a next-gen project installer for the Perl 6 ecosystem), I had a few classes with code like this:

class POC::Tester does App::Pls::Tester {
    method test($project --> Result) {
        my $target-dir = "cache/$project<name>";
        if "$target-dir/Makefile" !~~ :e {
            return failure;
        unless run-logged( relative-to($target-dir, "make test"),
                           :step('test'), :$project ) {
            return failure;

        return success;

(success and failure are Result enum values defined elsewhere. They felt like pleasant documentation, and when return type checking works, they'll even help catch errors!)

Now, I wanted to add super-simple progress diagnostics to this method. I wanted an announce-start-of('test', $project); at the start of the module, and either an announce-end-of('test', success); or an announce-end-of('test', failure);, depending on the success or failure of the method.

I have a low threshold for boilerplate. After realizing that I'd have to manually add those calls in the beginning of the method, and before each return — and not only in this method, but in several others — I thought "man, I shouldn't have to tolerate this. This is Perl 6, it should be able to do better!"

So I thought about what I really wanted to do. I wanted some sort of... method wrapper. Didn't really want a subclass, and a regular role wouldn't cut it (because class methods override same-named role methods).

Then it struck me: mixins. Did those already work in Rakudo? Oh well, try it and see. So I created this role:

role POC::TestAnnouncer {
    method test($project --> Result) {
        announce-start-of('test', $project&lt;name&gt;);
        my $result = callsame;
        announce-end-of('test', $result);
        return $result;

And then, later: does POC::TestAnnouncer

And it worked! On the first attempt! jnthn++!

(If you're wondering what in the above method that does the wrapping — it's the callsame call in the middle. It delegates back to the overridden method. Note that with this tactic, I get to write my announce-start-of and announce-end-of calls exactly once. I don't have to go hunting for all the various places in the original code where a return is made.)

I guess this counts as using mixins to do Aspect-Oriented Programming. This way of working certainly makes the code less scattered and tangled.

So, in this file, I currently have a veritable curry of dependency injection, behavior-adding roles, lexical subs inside methods, AOP-esque mixins, and a MAIN sub. They mix together to create something really tasty. And it all runs, today, under Rakudo HEAD.

As jnthn said earlier today, it's pretty cool that a script of 400 LoC, together with a 230-LoC module, make up a whole working installer. With so little code, it almost doesn't feel like coding.

Friday July 02, 2010
02:25 PM

Speaking hypothetically in Perl 6

So, arrays and hashes are considered central enough in Perl that they each have their own sigil, as well as a dedicated circumfix constructor:

Type    Sigil   Circumfix
====    =====   =========
Array   @       [ ]
Hash    %       { }

Apart from those, we consider scalars quite important, but they're really "containers of anything", including (references to) arrays and hashes. The $ sigil simply means "untyped". Because of this, there's not really a circumfix constructor.

Type    Sigil   Circumfix
====    =====   =========
Scalar  $       N/A
Array   @       [ ]
Hash    %       { }

But there's one more sigil; one which has had to fight a bit more for its place in the food chain... but this is the one that really makes the hackers over at "Lambda the Ultimate" smile. Introducing the & sigil:

Type    Sigil   Circumfix
====    =====   =========
Scalar  $       N/A
Array   @       [ ]
Hash    %       { }
Block   &       { }

Ok, hold on a minute. Block? A block of what?

So here's the really neat thing. In many situations in perfectly normal, sane programming, we end up with wanting to execute some code, just not right now. Just as we'd reach for an array or a hash when we want to collect some structured data for later, we can reach for this block thingy when we want to collect some executable code for later.

If you haven't done this, I can see how it all sounds terribly esoteric, even pointless. You'd go "Just wait until later, and run the code at that point rather than passing around un-run code!", and if a Block was only what I've told you so far, I'd agree with you.

But it's more. A Block is automatically a closure — and this is where people who've grokked this normally use big words (like "it closes over its lexical environment!") and the eyes of people who are struggling to understand glaze over. So I'll go slow.

Take a look at the Counter class in this blog post. Hm, I'll reproduce it here for you:

class LazyIterator {
    has $!it;

    method get() {

class Counter is LazyIterator {
    method new(Int $start) {
        my $count = $start;
        self.bless(*, :it({ $count++ }));

The new method contains code both to initialize and to increase $count, but only the initialization code (my $count = $start;) is run. The increasing code ({ $count++ }) is inside a Block, and thus protected from immediate execution. Instead, it's just stored away in the private attribute $!it (for "iterator").

When is $counter actually increased? Well, each time we call the LazyIterator.get method, it executes the Block stored in the $!it. This all seems perfectly obvious, until one starts to think about how magical it actually is. It increases... what, again? $counter? Which is... where, exactly? In the lexical scope of the method, which finished ages ago, and which by the way is in a subclass that wasn't even defined when we defined LazyIterator.get!!!

For that to even have a chance at working, the Block in $!it must "save away" $counter from the lexical scope of the method, enough for it to avoid being eaten by an evil garbage collector, etc. This is totally magical! It's as if you opened an empty bottle in orbit around Neptune to let some darkness in, and then whenever you opened the bottle again, no matter where you were, you'd get the same Neptune darkness from within the bottle.

Or like me coming to visit you, but instead of leaving my phone number, I activate one half of an entangled-pair portal in your living room and take the other half with me. Afterwards, I can just scribble whatever I want on my half, and you'd see it instantaneously appear in the other half in your living room. That's how insanely great closures are.

One of the very first blog posts I wrote here at was about that magical ability of closures to hold on to the environment in which they were created. Be sure to check out the diagram that goes with it, which explains how closures can be used to decouple parts of a large object-oriented system.

In fact, closures — or lambda expressions, same thing — are so general that they have been shown to be universal. That is, anything that a computer algorithm can do, lambda expressions can do, too. (In fact, Alonzo Church developed lambda calculus and used it to prove the Halting Problem undecidable in April 1936, only one month before Alan Turing showed the same with his gedanken state machine. In an addendum published that autumn, Turing shows that lambda calculus and his machine are equal in power.)

By the way, did you notice in the table at the start of the post that both hashes and blocks use the same circumfix constructor, { }? How will you know when you've got a hash and when you've got a block of code?

S04 explains and gives plenty of examples.

$hash = { };
$hash = { %stuff };
$hash = { "a" => 1 };
$hash = { "a" => 1, $b, $c, %stuff, @nonsense };

$code = { %_ };                            # use of %_
$code = { "a" => $_ };                     # use of $_
$code = { "a" => 1, $b, $c, %stuff, @_ };  # use of @_
$code = { ; };
$code = { @stuff };
$code = { "a", 1 };
$code = { "a" => 1, $b, $c ==> print };

Briefly, the code block will degenerate to a hash if it's empty or contains only a comma-separated list starting with either a pair or a %-sigil variable, and if it doesn't make use of any parameters. You can confirm that this covers all the cases above.

That might seem like a slightly arbitrary way of deciding, but it's actually the result of a fair bit of back-and-forth in the spec about when something is a closure and when it's a hash — and this spec iteration feels like a keeper. The previous ones led people into tricky situations where they supplied what they thought was a closure to a map, but it turned out to evaluate to a hash, and the multi dispatch to map failed. That doesn't seem to happen with the current spec, which is a good sign.

What are some common functions that accept blocks as arguments? I've already mentioned map, but even though the map/grep/sort triad has that slightly built-in feel, so they're not really a good example.

Here's one that's a good example:

$sentence = 'eye drops off shelf';
$newspaper-heading = $sentence.subst(/ \S+ /, { $/.ucfirst }, :global);
say $newspaper-heading; # Eye Drops Off Shelf

The vital part is the { $/.ucfirst } block. Why do we need to put that part in a block? Because if we didn't, it'd get executed immediately, as in before the .subst call was even made. The { } block constructor creates a protective shell of delayed action (same principle as with orally administered pills, really), and the substr method can then invoke the block when the time is right — i.e. after a match has been found. Newcomers on #perl6 often leave out the curlies, thinking that it'll magically work anyway.

If you're with me so far, you're ready for the next "look ma, no curlies!" stage.

We like closures so much (as language designers) that we want to build them into a lot of places. There are a number of places when we want to build them in so much that we even decide to lose the { } circumfix! If that sounds crazy, just look at these perfectly harmless examples:

if !@food || @food[0].lc eq 'marmite' {
    say "You either have no food or just marmite!";

class VeryImportantObject {
    has $!creation-time = time();

.flip.say for lines();

In all of the above cases, it's as if invisible { } curlies have been inserted around the emboldened parts for us, and then evaluated only if/when the time was right. Closures without curlies are sometimes referred to as "thunks".

(Why don't we special-case the second argument of Str.subst in the same way? Well, we certainly could, but it'd be kind of unfair to all other user-defined methods which don't automatically get the same special treatment. Somehow it's more OK to thunk language constructs like the infix:<||> operator, or has, or statement-modifying for, than it is to thunk the second argument in some method somewhere. But it's a perfect gotcha for static analysis to catch.)

But Perl 6 also gives you, the programmer, a way to omit the curlies if you just want to create a little one-off closure somewhere. It's provided through the ubiquitous "whatever" star, after which Rakudo Star was named.

The whatever star represents a curious bit of spec development, kind of a little idea that seemed to get a life of its own after a while and spread everywhere, like gremlins. It all started when the old "index from the end" syntax from Perl 5 was re-considered:

@a[ -1]    # getting the last element in Perl 5
@a[*-1]    # getting the last element in Perl 6

Why was this change made? S09 sums it up:

The Perl 6 semantics avoids indexing discontinuities (a source of subtle runtime errors), and provides ordinal access in both directions at both ends of the array.

When this feature was finally implemented in Rakudo, instead of treating the * - 1 like a syntactic oddity that's only allowed to occur inside array indexings, they generalized the concept so that * - 1 means { $_ - 1 }. (Note the surrounding block curlies.) This was considered nifty and trickled back into the spec. So now you can use all of the following forms to mean the same thing:

      { $_ - 1 } # means "something minus one"
-> $_ { $_ - 1 } # explicit lambda mention of $_
-> $a { $a - 1 } # change the name of the param
     { $^a - 1 } # "self-declaring" param
         * - 1   # note the lack of curlies

I haven't mentioned the "self-declaring" type of parameter so far. They're very nice, especially in small blocks where an explicit signature would give the block too much of a front weight. The spectical name for those are "placeholder variables", because they make a space for themselves in the parameter list, I guess. The place they get is their rank in an ascending string sort, by the way. You can't have both an explicit signature and placeholder variables for the same block — it's an either-or thing.

(Also, the only form which isn't exactly identical in the list above, is the first one, which actually translates to -> $_? { $_ - 1 }. That is, the $_ is optional, and you can call the block with 0 arguments. I don't remember the rationale for this, nor whether I've ever benefitted from it.)

A recent spec change generalized the whatever star so that if two or more occur in the same expression, they get assigned successive parameter slots. * + * translates into { $^a + $^b }, for example. So they're really starting to look a bit like "anonymous placeholder variables".

Now for the actual impetus for this post: in August 2000, almost ten years ago, Damian Conway made an RFC which anticipated a lot of the features outlined in this post. And it does this while suggesting syntax which is consistently less mnemonic and less maintainable than what we eventually ended up with. (Not the fault of Damian-from-ten-years-ago, of course. I'm pretty sure he's been instrumental in guiding us to many of the solutions we have today.)

Here's a quick summary of the key points of the RFC, and the modern Perl 6 responses:

  • The RFC suggests that one common use case for the "higher-order functions" (as it calls the closures) is in case statements with comparison ops in them, such as case ^_ < 10 { return 'milk' }. Note that the whatever star nicely fills this niche: when * < 10 { return 'milk' }.
  • Much of the rest of the RFC seems to be handled well today with either lambda signatures or placeholder variables (the $^a ones). $check = ^cylinder_vol == ^radius**2 * ^height or die ^last_words; could today be written $check = -> $cylinder_vol, $radius, $height, $last_words { $radius ** 2 * $height or die $last_words };. I hesitate to put that example in placeholder-variable form, because they're too many and the alphabetical mess would be too hard to maintain, even once.
  • The model we ended up with doesn't do automatic currying, like in the RFC. We do, however, have the extremely nice method .assuming on all Code objects (including Block), which gives you back a new Code object with one or more parameters pre-set.
  • Generally, the modern variants lean towards either explicit curlies or a whatever star to tell you that something mildly magical is going on. With the syntax proposed in the RFC, I suspect I'd be constantly less-than-certain about where implicit blocks ended.

In Apocalypse 6 the RFC was accepted with a "c" rating (that's for "major caveats"). I think that's accurate, because the spirit of the RFC definitely lives on, but the syntax of it all turned out much, much better. I guess that's the point of having the role of Language Designer centralized to one person.

Having exhausted the things I have to say about this topic, I'll stop here and see if I can get some closure myself. 哈哈

Thursday July 01, 2010
04:48 PM

Yapsi 2010.07 Released!

It is with an unwarranted sense of contentment that I want to announce, on behalf of the Yapsi development team, the July 2010 release of Yapsi, a Perl 6 compiler written in Perl 6.

You can get it here — try it, it's fresh!

Yapsi is implemented in Perl 6. It thus requires a Perl 6 implementation to build and run. We recommend the 'alpha' branch of Rakudo for this purpose. In practice, this means downloading Rakudo Minneapolis (#25) from January 2010.

Yapsi is an "official and complete" implementation of Perl 6. The fact that it's complete follows from a simple mis-applied proof technique: adding to something complete, one certainly doesn't make it less complete. We are making plenty of additions to Yapsi. Hence, Yapsi is complete; QED. It's official because there's no official definition of "official".

Here's how to get Yapsi up and running, once you have it:

  • Make sure you have a built Rakudo alpha in your $PATH as 'alpha'.
  • (Optionally) run 'make' for a load-time speedup.

This month's release introduces 'if'/'else' statements and 'while' loops. Having Yapsi run all day on your computer just got a little easier. You're welcome.

Quite a lot of features are now within reach of people who are interested in hacking on Yapsi. See the doc/LOLHALP file for a list of 'em.

Yapsi consists of a compiler and a runtime. The program is compiled down into an instruction code that is "closer to the machine", for some imaginary value of "machine". Compiling down to an instruction code is not strictly necessary; if we wanted, we could compile down to tiny pecan pies. (But that would be NUTS! What we do instead is just plain SIC.) The instruction set changes quite a bit between releases; if you rely on SIC being stable between releases, you are completely bananas.

An overarching goal for making a Perl 6 compiler-and-runtime is to use it as a server for various other projects, which will hook in at different steps:

  • A time-traveling debugger (tardis), which hooks into the runtime.
  • A coverage tool (lid), which will also hook into the runtime.
  • A syntax checker (sigmund), which will use output from the parser.

Another overarching goal is to optimize for fun while learning about parsers, compilers, and runtimes. We wish you the appropriate amount of fun!

Monday June 28, 2010
04:27 PM

Weeks 4 and 5 of GSoC work on Buf -- chrono-flies

Chrono-flies (Drosophila chronogaster, commonly known as "time-flies") are known for their fondness for arrows. Lately I've been distracted enough (by $DAYJOB, other Perl 6 projects, and other non-Perl 6 projects) to let two weeks pass by with only one commit in my local branch:

  • Buf is now Positional, and you can index it with postcircumfix:<[ ]>.

I merged/pushed that one a few moments ago.

For good measure (pronounced "only one commit! how'd that happen?"), I did some pre-investigation this evening as to how one might read binary data from a file into a Buf. cotto++ outlined how. Haven't finished thinking about this, but it does seem perfectly doable, which is a notch better than I feared. 哈哈

Also, worth noting here: remember how in Week 2, I changed the Buf constructor spec from slurpy array to non-slurpy array? Well, pmichaud wondered why, and it led to an interesting discussion. Will synch up with jnthn to see if it perhaps merits changing back for consistency with other list-y types.

Thursday June 17, 2010
06:46 PM

Announce: Rakudo Perl 6 development release #30 ("Kiev")

On behalf of the Rakudo development team, I'm pleased to announce the June 2010 development release of Rakudo Perl #30 "Kiev". Rakudo is an implementation of Perl 6 on the Parrot Virtual Machine (see The tarball for the June 2010 release is available from Github.

Rakudo Perl follows a monthly release cycle, with each release named after a Perl Mongers group. This release is named after the Perl Mongers from the beautiful Ukrainian capital, Kiev. They recently helped organize and participated in the Perl Mova + YAPC::Russia conference, the хакмит (hackathon) of which was a particular success for Rakudo. All those who joined the Rakudo hacking — from Kiev and further afield — contributed spec tests as well as patches to Rakudo, allowing various RT tickets to be closed, and making this month's release better. Дякую!

Some of the specific changes and improvements occurring with this release include:

  • Rakudo now uses immutable iterators internally, and generally hides their existence from programmers. Many more things are now evaluated lazily.
  • Backtraces no longer report routines from Parrot internals. This used to be the case in the Rakudo alpha branch as well, but this time they are also very pleasant to look at.
  • Match objects now act like real hashes and arrays.
  • Regexes can now interpolate variables.
  • Hash and array slicing has been improved.
  • The REPL shell now prints results, even when not explicitly asked to print them, thus respecting the "P" part of "REPL".
  • Rakudo now passes 33,378 spectests. We estimate that there are about 39,900 tests in the test suite, so Rakudo passes about 83% of all tests.

For a more detailed list of changes see docs/ChangeLog.

The development team thanks all of our contributors and sponsors for making Rakudo Perl possible, as well as those people who worked on parrot, the Perl 6 test suite and the specification.

The following people contributed to this release: Patrick R. Michaud, Moritz Lenz, Jonathan Worthington, Solomon Foster, Patrick Abi Salloum, Carl Mäsak, Martin Berends, Will "Coke" Coleda, Vyacheslav Matjukhin, snarkyboojum, sorear, smashz, Jimmy Zhuo, Jonathan "Duke" Leto, Maxim Yemelyanov, Stéphane Payrard, Gerd Pokorra, cognominal, Bruce Keeler, Ævar Arnfjörð Bjarmason, Shrivatsan, Hongwen Qiu, quester, Alexey Grebenschikov, Timothy Totten

If you would like to contribute, see "How to help", ask on the perl6-compiler mailing list, or ask on IRC #perl6 on freenode.

The next release of Rakudo (#31) is scheduled for July 22, 2010. A list of the other planned release dates and code names for 2010 is available in the docs/release_guide.pod file. In general, Rakudo development releases are scheduled to occur two days after each Parrot monthly release. Parrot releases the third Tuesday of each month.

Have fun!

08:04 AM

It isn't quite TDD, but I like it

In several tightly controlled projects over the past few years, I seem to either follow or approximate this sequence of steps:

  1. Write a test suite skeleton.
  2. Flesh it out into a test suite. (Make the tests run with a minimal implementation skeleton.)
  3. Make the tests pass by fleshing out the implementation.

I haven't seen such a way of working mentioned elsewhere, so I thought I'd make note of it here.

The idea with the first step is to separate most of the thinking from the automatic task of writing the tests. I find if I do this, I get better test coverage, because separation allows me to retain an eagle-eye view of the model, whereas if I were to switch back and forth between thinking about the whole and writing about the parts, I'd lose sight of the whole. To some degree. Also, having something to flesh out cancels out the impulse to cheat and skip writing tests.

Step two ignores the mandate of TDD to write only one failing test at a time. I still prefer to have the whole test suite done before starting the implementation, again because I get rid of some context-switching. Usually I treat the implementation process in much the same way as if I had written the tests on-demand. It occasionally happens that a test already passes as soon as I write the minimal scaffold needed to run the tests. As I currently understand TDD, this is also "frowned upon". I leave them in there, because they're still part of the specification, and might even catch regressions in the future.

I tried this out last weekend, and it was a really nice match with the problem domain — an I/O-free core of a package installer:

  1. Write a test suite skeleton: Just a bunch of prose comments.
  2. Flesh it out into a test suite: one commit per skeleton test file.
  3. Make the tests pass: one commit per subpart.

And presto, a complete (core) implementation with great test coverage.

Those who follow the links to actual commits will note that mistakes are corrected during the implementation phase. That's a symptom of the haltingproblem-esque feature of code in general; you don't know its true quality until you've run it in all possible ways.

Sunday June 13, 2010
05:59 PM

Week 3 of GSoC work on Buf -- talk like a Parrot day

Remember that long-term solution for conversions between strings and arrays of bytes that I mentioned last week that the Parrot people were discussing?

No? Well, anyway, NotFound++ wrote one, and he suggested I try it for my Str⇄Buf conversions. It worked!

I thought I would get more than that done this week, but I didn't. Oh well. For next week, there are still a few low-hanging branches of fruit to persue.

  • I want to add postfix:<[ ]> indexing to the Buf class, so it feels a bit more Positional.
  • There's a lot of bounds-checking that needs to be made, both in the constructor and in decoding.
  • I have UTF-8 down pat; need to find a way to convince Parrot to do other encodings, such as ISO-8859-1.
  • There's one test that talks about NFD. Haven't even started looking at that yet.

It's interesting to see where all the effort goes. I've spent most of my time so far on the Str.encode and Buf.decode methods, and almost everything else is trivial in comparison. Feels like some sort of 90%-10% rule at work.