Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Monday August 29, 2005
01:28 PM

Lexing without Parsing

[ #26502 ]

I'm writing a new article and am going to figure out where to submit it. This one is about lexing without parsing (well, lexing without a grammar, to be accurate). Perl is great at munging text, but sometimes we have complicated text that we need to analyze, but we have no proper grammar for it. If the data is a line oriented logfile with a very predictable format, no big deal. However, if it's rather irregular and the regular expressions are getting too complicated, lexing the data into predictable tokens can make a hard problem very easy to manage.

A good example of this is parsing SQL. There's no complete SQL grammar written in Perl and the snippet I posted showed how to extract column aliases (SQL::Statement uses SQL::Parser and doesn't handle CASE statements, so it's not a solution. Jeff Zucker welcomes patches, though :)

Disclaimer: I'm not claiming that this technique (which I learned from HOP, I might add) is the best way of solving the "parsing SQL" problem. It's merely an illustration of the technique involved. A more complicated example involves transforming math expressions.

X \= 9 / (3 + (4+7) % ModValue) + 2 / (3+7).

I found myself needing to transform expressions like that into Prolog, respect precedence and allow parentheses to override precedence to become this:

ne(X, plus(div(9, mod(plus(3, plus(4, 7)), ModValue)), div(2, plus(3, 7)))).

Writing a simple lexer made it very easy to do, though I'll probably beef it up to allow constant folding.

ne(X, plus(div(9, mod(14, ModValue)), .2)).

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Creating a grammar for ANSI SQL is possible I suppose, but vendor nuances would make it difficult to come up with an all encompassing solution.

    I wonder if the major DB vendors have their respective grammars hidden in a vault somewhere, taunting us.

    • In Oracle's case, they absolutely do.

      I used to use the online docs with the "rail road" graphics in them (which are derived from the BNF) all the time.
    • That's why SQL::Parser allows you to name the SQL dialect you're working with.

      As a general purpose solution, though, I've thought it would be nice to use the parser from SQLite. A tempting thought, but my C is probably not up to snuff.

      • Is there an Oracle dialect available somewhere for SQL::Parser? I have a little "project" to see which Oracle columns and tables are being used in an internally developed report generator.
        • I'm not an expert with SQL::Parser. You'll have to ask jzed, the author. His contact info is in the docs. He's quite helpful.