Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

tinman (2063)

tinman
  (email not shown publicly)

tinman spent a few years mucking around industry before going back to school for a Masters. Currently not enjoying the weather in North England..

He wrote Perl that looked suspiciously like C code in 1998, while working as an intern, and has been trying to cure that bad habit ever since.

Journal of tinman (2063)

Wednesday March 24, 2004
12:09 PM

parsers and stringy stuff

[ #18046 ]

Playing around with JavaCC (For some weird reason, that URL is HTTPS). Slightly mangled code, but it really does a great job.

This whole Token business is beginning to depress me. I'm trying to wriggle a few of my custom filters before tokenization even begins in the Lucene sample, and the classes are a bit err.. complicated. Oh, well. If it was easy, it wouldn't be this much fun trying to figure all of it out.

One other note: a coworker (well, someone else at the university) got a Tomcat cluster working. mod_jk2 in front serving requests and session replication within the cluster. Cool stuff (and it's all FREE. I know the expensive app servers can replicate sessions ;)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • If you're busy with Java parser generators, you might want to look at the excellent ANTLR [antlr.org] as well.
    • I looked at a number of parser generators (flex, GOLD, ANTLR, ...) to mark tokens up as XML. I was struck by two things in my search:

      1. Apparently, I am the first person in the history in the universe to want to do this.
      2. Given the amount of time PGs have been around, plus the number and size of their communities, there do not seem to be any comprehensive grammar repositories (try looking for Perl, VBScript, TSQL, JavaScript, ...).
      • About grammar repositories: given that the style and shape of a grammar is influenced by the constraint of the parser generator (LL vs LR mostly), this explains why there are no generic grammar repository -- this would be impractical since grammars would need adaptations to each parser generator.

        And about a grammar for Perl 5 : there is no such thing as a context free grammar for Perl 5. You can always look at perly.y in the perl source distribution, but the tokenizer is the scary part. "Nothing but perl c

    • Oh, yes. Thanks. I knew about Antlr and friends from earlier attempts to write a parser for a not-so-simple configuration file. But it seems the Lucene project uses JavaCC , which I had never used before. I was wondering if it would be as good, and it seems so. Lucene is under the Apache Software Foundation, so perhaps they were turned off by the vague sounding license [antlr.org] terms ? Maybe the guy who wrote the code initially didn't know about Antlr ? I'm just speculating :)