Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

pshangov (3074)

pshangov
  (email not shown publicly)
http://mechanicalrevolution.com/

Journal of pshangov (3074)

Tuesday August 11, 2009
09:36 AM

Data::AsObject Released - Data Structures Made Easy

[ #39444 ]
Cross-posted from http://mechanicalrevolution.com/.

Perl is notorious for its punctuation-ridden syntax, and if there is one place where this is manifested most obviously, it is when working with data structures. While I myself can see the beauty behind the line noise and have nothing against the syntax per se, it sometimes feels there are just too many characters to type. In particular, I have recently had to do a lot of work with XML data represented by perl hashes, via XML::TreePP and XML::Compile. Working with the data structures generated by these modules can quickly become pretty painful.

Enter Data::AsObject. It allows you to work with hash and array references as if they were objects. For example, I often have to process XLIFF files, which are used in the translation industry. Using XML::Compile, I can get my XLIFF files serialized into a hash and use it as follows (you don't need to know the details of the XLIFF format to see the point of the example):

$xliff holds the serialized xml
# get the source language of the first file
my $source_lang = $xliff->{'seq_any'}->[0]->{'file'}->{'source-language'};

# get all the translation units in the first file
my @trans_units = @{ $xliff->{'seq_any'}->[0]->{'file'}->{'body'}->{'cho_group'}->[0]->{'trans-unit'} };

# for each translation unit, add an alternative translation with a source and a target
foreach my $tu (@trans_units) {
    my @matches = get_matches($source->textContent);

    my $id = 0;
    foreach my $match (@matches) {
        $tu->{'cho_context-group'}->[$id]->{'alt-trans'}->{'source'}->{'_'} = $match->source;
        $tu->{'cho_context-group'}->[$id]->{'alt-trans'}->{'target'}->{'_'} = $match->target;
        $id++;
    }
}

The same example with Data::AsObject (for this to work, hooks need to be added to XML::Compile to automatically convert “source-language”, “trans-unit” and other elements with dashes to “source_language”, “trans_unit” etc.):

# Data::AsObject::dao converts a hashref or an arrayref to a
# Data::AsObject::Hash or a Data::AsObject::Array object
dao $xliff;

my $source_lang = $xliff->seq_any(0)->file->source_language;
my @trans_units = $xliff->seq_any(0)->file->body->cho_group(0)->trans_unit;

foreach my $tu (@trans_units) {
    my @matches = get_matches($source->textContent);

    my $id = 0;
    foreach my $match (@matches) {
        $tu->cho_context_group($id)->alt_trans->source->{'_'} = $match->source;
        $tu->cho_context_group($id)->alt_trans->target->{'_'} = $match->target;
        $id++;
    }
}

This an almost real life example and you can easily see what benefits in terms of readability Data::AsObject provides. Of course there are many caveats, the primary one being that you need to be able to control your input and guarantee that hash keys will only contain alphanumeric characters and underscores. Go check out the docs for more usage details.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Very interesting. I like it!
  • For that type of problem, I find that using XML::LibXML and XPath leads to code with even less syntax.
    • Sometimes, however, XML::LibXML is not an option. In the example above, I use XML::Compile (which BTW uses XML::LibXML internally), which is currently the best option if you want to work with schema-compliant XML documents (especially if you want to create and modify ones). The other such module I often use is XML::TreePP, which is a pure-perl solution and is available in environments where XML::LibXML isn't.