On my Parrot days this week I started to add operator support (just + and - to start), but ran into a snag with PGE's operator precedence parser. I expect it's just an undocumented necessary step, so I dropped a note to Patrick and set that aside.
Instead, I made the changes to take advantage of fact that PGE::Text::bracketed now extracts the bracketed value for you. (Thanks Patrick!) And then I started on comma lists. I've got it parsing and transforming to PAST. I'm currently trying to decide what the best way is to handle the transformation to POST. Specifically, I want to transform a Punie statement such as:
print 1, 2;
into the PIR:
print 1
print 2
So internally, I'm splitting a PAST::Op (print) node with a PAST::Op (comma) child into two equal level POST::Op (print) nodes. There are several ways to do it. It's mainly a question of where I want the complexity to bubble up (waterbed theory).
~~~~~~~~~~~~~~~~~~~~
One interesting little tidbit: because PGE is a recursive descent parser, I can't just directly translate certain rules from the original perl.y. The biggest problem is left recursion:
rule expr {
<PunieGrammar::expr> = <PunieGrammar::expr>
| <PunieGrammar::expr> \+ <PunieGrammar::expr>
|...
| <PunieGrammar::term>
}
In a recursive descent parser the left recursion is infinite. One solution is to eliminate the left recursion and translate that to something like:
rule expr {
<PunieGrammar::term> <PunieGrammar::rexpr>
}
rule rexpr {
= <PunieGrammar::rexpr>
| \+ <PunieGrammar::rexpr>
|...
| <null>
}
Unfortunately, this produces a match tree that's less-than-ideal for working out operator precedence. That's why PGE has a built-in operator precedence (shift/reduce) parser.
I've been off work for most of the past two weeks. I wish I could say I got lots of lovely development time in there, but unfortunately I spent most of it doped up and lying on a couch with a heating pad to my face trying to get my jaw to unlock (complications after getting all 4 wisdom teeth pulled). It's finally unlocked now and last night I got to eat steak for the first time. That may very well be the most delicious steak I've ever tasted.
I'm still not quite 100%, but I plan to get back to my normal development schedule next week.
One bit of development news I forgot to post before I went AFK: I finished the switch to using a pre-compiled PGE grammar for parsing Punie. This is a tiny speed boost, but more importantly it makes the Punie grammar easier to expand (and maintain), because the grammar source is abstracted out into a separate file, instead of being mixed up with PIR code.
Do you have a shortcut or two you use all the time when you're writing Perl code? I don't mean how you condense your code from 12 lines to 3 characters, but things like the Vim or Emacs incantations you can't live without, or Firefox shortcuts so you can just type "cpan Module::Name" in the URL bar to jump to search.cpan.org results. Or, perhaps something written in Perl that's a little off-the-wall but makes your life easier or more fun, like your favorite Perl GUI widget, or a Perl-controlled alarm clock?
If you have an idea like this, or if you just want to debate the rules of the CPAN drinking game, drop us a note at alias+perl_hacks_contrib@wgz.org (this will send your message to the list and subscribe you). The best ideas will get added to a book called Perl Hacks.
Today I worked on expanding Punie. Until now it has only been able to print a single digit, and only allowed one print statement in a file. It can now handle more than one statement in a file, and can print multi-digit integers and double-quoted strings.
print 1;
print 23;
print "ok 1\n";
The double-quoted strings use the Text::Bracketed feature Patrick developed for PGE.
I've also started working on switching the Punie grammar over to the new, more maintainable way PGE offers for defining grammars. So, instead of:
p6rule('\d+ | <PGE::Text::bracketed: ">', 'PunieGrammar', 'term')
...
I can now say:
grammar PunieGrammar;
rule term { \d+ | <PGE::Text::bracketed: "> }
...
This week I worked on the tree grammar to transform Punie's abstract syntax tree (PAST) to an opcode syntax tree (POST). That went along pretty quickly, even with a little time wasted down an odd rabbit hole or two. The current set of POST nodes are minimal because I'm still only trying to compile "print 1;". They'll grow and change as I expand the compiler.
Since I had a little time left over, I went ahead and started working on the next step: converting POST into something executable. For now, PIR output is good enough. I just need to close the final gap so I can run tests on Perl 1 code. So, I threw together a little mock-up to fill the gap. It's a TGE tree grammar, but instead of transforming one tree to another tree, it transforms a tree to a string of source code. It's kind of like a PGE grammar in reverse.
That also went pretty fast and I still had a little time left over, so I went ahead with the last step, which is compiling and excuting the PIR code. That part only takes a few lines of code.
So, I swapped punie2.pir in as the main punie.pir, ran the language tests and they passed. Yay!
Next week I'll expand Punie so it does more than just print a single digit.
I've checked in the code to transform Punie match objects into AST trees. The core pieces are:
languages/punie/lib/pge2past.glanguages/punie/lib/PAST/punie2.pir, because it's still not far enough along to replace punie.pir.So, if you run:
parrot punie2.pir demo.p1
You'll get a match object something like:
"$/" => PMC 'PGE::Rule' => "print 1;" @ 0 {
<PunieGrammar::expr> => PMC 'PunieGrammar' => "print 1" @ 0 {
<PunieGrammar::gprint> => PMC 'PunieGrammar' => "print 1" @ 0 {
<PunieGrammar::expr> => PMC 'PunieGrammar' => "1" @ 6 {
<PunieGrammar::term> => PMC 'PunieGrammar' => "1" @ 6
}
[0] => PMC 'PunieGrammar' => "print" @ 0
}
}
}
And an AST something like:
<PAST::Stmts> => {
'source' => 'print 1;',
'pos' => '0',
'children' => [
<PAST::Stmt> => {
'source' => 'print 1;',
'pos' => '0',
'children' => [
<PAST::Exp> => {
'source' => 'print 1',
'pos' => '0',
'children' => [
<PAST::Op> => {
'source' => 'print 1',
'pos' => '0',
'op' => 'print',
'children' => [
<PAST::Exp> => {
'source' => '1',
'pos' => '6',
'children' => [
<PAST::Val> => {
'source' => '1',
'pos' => '6',
'value' => '1',
}
]
}
]
}
]
}
]
}
]
}
In these source is the chunk of source code from the original file corresponding to the current node (good for error reporting), pos is the offset from the start of the source file to the point where the corresponding match node started to match, and children is a list of child nodes for the current node. Some node types, like PAST::Op and PAST::Val have custom attributes.
I've still got some time left to work today, so I'll start on the OST now (though, I probably won't bother to post about it unless I run across something exciting).
I forgot to write about my Parrot day last week. I started work on using TGE in Punie. TGE takes care of all the steps in the compiler except the last step of compiling the low-level opcode syntax tree (OST) down to Parrot bytecode.
I'm approaching it by first implementing the steps all the way through, but only for the single statement "print 1;". So far, I'm part-way through the initial step of transforming the output from TGE into the high-level abstract syntax tree. That seems kind of slow progress (to me at least), but I spent a good bit of the time waffling on what AST nodes to use. Then I decided I'd just start with something similar to Pugs' PIL, and refine it as I go along. (Okay, I decided that when I first wrote the draft design doc, but sometimes I need to decide the same thing twice before it sticks.
This week should go faster, through the AST transformations and hopefully even on to the OST transformations.