Friday January 09, 2009
Java parser in Parrot/PGE
My favorite part of Perl 6 is the new grammar syntax. Over the last couple of days, I translated a Java source code grammar from antlr to PGE. After about 4-5 hours of work, I now have a Perl 6 grammar that can parse all of the .java files in the OpenJDK (the Java 7 source code). Well, that may be a lie. It's still crunching at about 5-10 seconds per file so it will be a while before I know if its really true.
Admittedly most of the credit goes to the authors of the antlr grammar I adapted, but this also says good things about the Perl 6 regex implementation in Parrot.
The things that bit me hardest were:
- negated classes (PGE doesn't understand "<-[abc]>" so I had to make the inner part a separate token)
- antlr allows character classes with outside of any character group syntax (antlr: "'0'..'9'", perl: "<[0..9]>")
- longest token on integer vs. float (I had to change the antlr grammar to put float ahead of integer)
- whitespace (I cribbed from the Pipp implementation)