Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

jsmith (3335)

  (email not shown publicly)

I'm a web applications developer trying to bring all things open source to all things humanities at Texas A&M University.

Journal of jsmith (3335)

Thursday March 10, 2005
03:54 PM

Perl Syntax Mangling and XML Compilers

[ #23591 ]

My mind has been trying to wrap itself around the idea of an XML Compiler Compiler. That would be something that takes a description of an XML language (such as a RelaxNG description with some additional bits to explain how to actually do the compile) and writes a SAX handler that will do the compile. Of course, we need to allow one XML language to embed or be embedded in another XML language. (A lot of this comes from the work on the Gestinanna project that resulted in a compiler for statemachines and another for workflows that shared a lot of common code.)

With that in mind, I started fresh work on the compiler code without trying to hack the existing Gestinanna modules. I also changed where I put the commas and semi-colons.

Now, I have code like the following:

package XML::Compiler::SAXHandler

; sub new_handler {
    my($type, %params) = @_

  ; return bless { %params
                 , Context => [ ]
                 , Current_NS => { }
                 } => $type

; sub start_document {
    my $e = shift

  ; my $sub
  ; foreach my $ns ($e -> handled_namespaces) {
        foreach my $h ($e -> ns_handlers($ns)) {
            if( ($sub = $h -> can('start_document'))
                && ($sub != \&start_document) ) {
                $sub->($h, $e) && last

; sub start_element {
    my($e, $el) = @_
  ; $el -> {Parent} ||= $e -> {Current_Element}
  ; $el -> {Namespaces} = { %{$el -> {Parent} -> {Namespaces} || {}}
                          , %{$el -> {Namespaces} || {}}
  ; $e -> {Current_Element} = $el

  ; my $ns = $el -> {NamespaceURI}

  ; my @attribs
  ; my @defined_ns

  ; foreach my $attr (@{$el -> {Attributes}}) {

  ; $el -> {_Defined_NS} = \@defined_ns
  ; $el -> {Attributes} = \@attribs

  # need to handle block v. expression context setting
  ; push @{$e -> {Context}}, 0

After you uncross your eyes, you might wonder why I put the semis and the commas at the beginning of the line (and instead of commas and semis being optional at the end of a block, they are now optional at the beginning of a block, with semis optional after a block as well). The key to this is the last comment in the code example. Semicolons delimit series of statements while commas delimit series of expressions. The traditional approach of putting these at the end of their respective part means we need to know the type of code we just emitted. By putting them at the beginning, we only need to know what kind of code we are expecting.

By managing what we expect when we see a start element, we can hopefully simplify some of the code that otherwise would need to handle the selection of the statement or expression terminator in the end element.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • There's a reason in Perl that you can always end a block with a semicolon (it doesn't create a null statement) and can always end a list with a comma (it doesn't create an extra null element. Please do that. Don't write Perl code the way no sane person would do it, even automatically.
    • Randal L. Schwartz
    • Stonehenge
    • I agree that the ultimate code generated should be readable. I was just doing that as an exercise to get a different view on things. Some of the things XML languages require are a bit twisted when they get translated into a language such as Perl (e.g., should the <for-each/> become a foreach or a map in Perl? is it in a statement or an expression context? need to throw some <sort/> expressions in before we iterate...).

      The plan would be to put the code through a pretty-printer anyway, whi

    • Horror: