Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Sunday January 20, 2008
08:54 AM

Relax(i)NG and Staying DRY in Bermuda

[ #35435 ]

I had an acceptance tester walk up to me and ask why there was no revision information attached to a particular element in our REST interface. It was a bug. To fix that bug required one line of code. I added it. The tests failed.

Turns out that it requires two lines of code. I added the other line. The tests failed.

I updated the requisite RELAX NG schema. The tests failed.

I've started updating the hard-coded XML files in some acceptance tests. My brain failed.

Wouldn't it be nice if all you had to do was update a schema? Imagine reading a RELAX NG schema and generating the appropriate Perl code to create the data structures which then gets run through an XML generator to produce XML which automatically validates against the RELAX NG? You could even autogenerate tests with this. Then, you just update your RELAX NG and that's it!

Doesn't quite work. RELAX NG is all about the structure of a document, not its meaning. If you see a last-modified attribute three years in the future, you know it's probably wrong, but RELAX NG doesn't. So I experimented with adding annotations to RELAX NG, but it was making my parser more complicated. RELAX NG doesn't do everything I need.

A further problem is that RELAX NG does more than I need:

element addressBook {
  element card {
    attribute name { text },
    attribute email { text }+
  }*
}

See that plus sign after the email attribute? It means "one or more". While duplicate attribute names are valid in XML, many XML parsers get confused and it can also be ambiguous since attributes are inherently unordered. You can also write your RELAX NG with a grammar and specify the start element, but that doesn't fit my needs in this case. As a result, a programmer can write valid RELAX NG but it wouldn't work in my system, so that's an expectation violation. It also further complicated my parser and was making my hard work much harder. RELAX NG simultaneously does too little and too much. It's not a bad fit, but it's not a great one, either. (A pure Perl RELAX NG parser would help tremendously, but that would delay this even further).

I've started writing custom YAML files which do exactly what I need. The system is named "Bermuda" and the YAML files, named "islands", have a .bmd extension (GameCube also uses a .bmd extension). Code which generates code is very difficult to write, but the payoffs are huge. I seriously doubt this will ever see the light of day, but it's a fun project. Here's a sample YAML file:

---
package: My::Card
island: card
attributes:
  href:
    type: anyURI
    method: url
  revision?:
    if: 'defined $card->revision'
    type: positiveInteger
elements:
  - name
  -
    type: string
  - email*
  -
    type: string
  - phone+
  -
    method: phone_numbers
    type: string

And here's the generated code (yes, I actually have this much working):

package My::Card::Bermuda;

use strict;
use warnings;
use Carp 'croak';

sub new {
    my ( $class, $instance ) = @_;
    return bless {
        data => {
            island     => 'card',
            attributes => {},
            element    => [],
        },
        instance => $instance,
    } => $class;
}

sub instance { shift->{instance} }
sub name     { 'card' }

sub build {
    my ( $self ) = @_;
    $self->_add_attributes;
    $self->_add_elements;
    return $self;
}

sub _add_attributes {
    my ($self) = @_;
    my $card = $self->instance;
    if (defined $card->revision) {
        $self->{data}{attributes}{revision} = $card->revision;
    }
    $self->{data}{attributes}{href} = $card->url;
    return $self;
}

sub _add_elements {
    my ($self) = @_;
    my $card = $self->instance;
    my $count;
    push @{ $self->{data}{elements} } => {
        name       => 'name',
        attributes => {},
        element    => [ $card->name ],
    };
    $count = 0;
    foreach my $email ( $card->email ) {
        $count++;
        push @{ $self->{data}{elements} } => {
            name       => 'email',
            attributes => {},
            element    => $email,
        };
    }
    $count = 0;
    foreach my $phone ( $card->phone_numbers ) {
        $count++;
        push @{ $self->{data}{elements} } => {
            name       => 'phone',
            attributes => {},
            element    => $phone,
        };
    }
    unless ($count) {
        croak("Method 'phone_numbers' failed to return at least one element");
    }
    return $self;
}

1;

I'm not writing out the RELAX NG yet, but it would look like something like this:

element card {
    attribute revision { xsd:positiveInteger }?,
    attribute url      { xsd:anyURI },
    element   name     { xsd:string },
    element   email    { xsd:string },
    element   phone    { xsd:string }*
}

That's a nice syntax and I really wish I could use it, but it just doesn't quite fit :(

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Have you tried to combine Schematron with RELAX NG? It’s quite easily possible, as RNG has extension points that allow for such an undertaking, and Schematron gives you rule- as opposed to grammar-based validation. In short, Schematron rules are arbitrary XPath expressions that must match/be true in the contexts you specify for them. Particularly with suitable XPath extension functions, that lets you validate pretty much any kind of constraint whatsoever.

    (You can also use Schematron standalone, but

    • Actually, I have trang installed and used that to convert the compact grammar to XML. I had stuff like this:

      element card {

          ## if: defined $card->revision
          attribute revision { xsd:positiveInteger }?,
          element name    { xsd:string },
          element email   { xsd:string },

          ## method: phone_numbers
          element phone   { xsd:string }*
      }

      And it was getting converted to this:

      <?xml version="1.0" encoding="U