tinman spent a few years mucking around industry before going back to school for a Masters. Currently not enjoying the weather in North England..
He wrote Perl that looked suspiciously like C code in 1998, while working as an intern, and has been trying to cure that bad habit ever since.
Playing around with JavaCC (For some weird reason, that URL is HTTPS). Slightly mangled code, but it really does a great job.
This whole Token business is beginning to depress me. I'm trying to wriggle a few of my custom filters before tokenization even begins in the Lucene sample, and the classes are a bit err.. complicated. Oh, well. If it was easy, it wouldn't be this much fun trying to figure all of it out.
One other note: a coworker (well, someone else at the university) got a Tomcat cluster working. mod_jk2 in front serving requests and session replication within the cluster. Cool stuff (and it's all FREE. I know the expensive app servers can replicate sessions
ANTLR (Score:2)
Re:ANTLR (Score:1)
I looked at a number of parser generators (flex, GOLD, ANTLR, ...) to mark tokens up as XML. I was struck by two things in my search:
Re:ANTLR (Score:2)
And about a grammar for Perl 5 : there is no such thing as a context free grammar for Perl 5. You can always look at perly.y in the perl source distribution, but the tokenizer is the scary part. "Nothing but perl c
Re:ANTLR (Score:1)
Oh, yes. Thanks. I knew about Antlr and friends from earlier attempts to write a parser for a not-so-simple configuration file. But it seems the Lucene project uses JavaCC , which I had never used before. I was wondering if it would be as good, and it seems so. Lucene is under the Apache Software Foundation, so perhaps they were turned off by the vague sounding license [antlr.org] terms ? Maybe the guy who wrote the code initially didn't know about Antlr ? I'm just speculating :)