Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ChrisDolan (2855)

ChrisDolan
  (email not shown publicly)
http://www.chrisdolan.net/

Journal of ChrisDolan (2855)

Friday January 09, 2009
12:02 AM

Java parser in Parrot/PGE

[ #38246 ]

My favorite part of Perl 6 is the new grammar syntax. Over the last couple of days, I translated a Java source code grammar from antlr to PGE. After about 4-5 hours of work, I now have a Perl 6 grammar that can parse all of the .java files in the OpenJDK (the Java 7 source code). Well, that may be a lie. It's still crunching at about 5-10 seconds per file so it will be a while before I know if its really true.

Admittedly most of the credit goes to the authors of the antlr grammar I adapted, but this also says good things about the Perl 6 regex implementation in Parrot.

The things that bit me hardest were:

  1. negated classes (PGE doesn't understand "<-[abc]>" so I had to make the inner part a separate token)
  2. antlr allows character classes with outside of any character group syntax (antlr: "'0'..'9'", perl: "<[0..9]>")
  3. longest token on integer vs. float (I had to change the antlr grammar to put float ahead of integer)
  4. whitespace (I cribbed from the Pipp implementation)
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • So now I'm tempted to take this and write a bunch of actions to produce PAST and see how quickly I can make it compile some Java programs... Yes, I'm a compiler-writing junkie.
    • My thought exactly, but I'm a couple years behind you in familiarity with Parrot. I'm happy to give you commit to my SVN, or move the grammar to another repository.

      It turns out I still have some grammar issues to work out. I'm failing about 3% of the JDK source code, mostly due to longest-token assumptions in the antlr grammar. But I expect I'll be to 100% by next week. Plus, the grammar could use some refactoring to use the optable. I'm hoping to look at that.

    • Is there a 12-step program?