Friday January 09, 2009
12:02 AM
Java parser in Parrot/PGE
My favorite part of Perl 6 is the new grammar syntax. Over the last couple of days, I translated a Java source code grammar from antlr to PGE. After about 4-5 hours of work, I now have a Perl 6 grammar that can parse all of the .java files in the OpenJDK (the Java 7 source code). Well, that may be a lie. It's still crunching at about 5-10 seconds per file so it will be a while before I know if its really true.
Admittedly most of the credit goes to the authors of the antlr grammar I adapted, but this also says good things about the Perl 6 regex implementation in Parrot.
The things that bit me hardest were:
- negated classes (PGE doesn't understand "<-[abc]>" so I had to make the inner part a separate token)
- antlr allows character classes with outside of any character group syntax (antlr: "'0'..'9'", perl: "<[0..9]>")
- longest token on integer vs. float (I had to change the antlr grammar to put float ahead of integer)
- whitespace (I cribbed from the Pipp implementation)
Gah!!! (Score:1)
Re: (Score:1)
My thought exactly, but I'm a couple years behind you in familiarity with Parrot. I'm happy to give you commit to my SVN, or move the grammar to another repository.
It turns out I still have some grammar issues to work out. I'm failing about 3% of the JDK source code, mostly due to longest-token assumptions in the antlr grammar. But I expect I'll be to 100% by next week. Plus, the grammar could use some refactoring to use the optable. I'm hoping to look at that.
Re: (Score:1)
Is there a 12-step program?
Re: (Score:1)
If it takes twelve steps, we need to make PCT even easier to use.