Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]


posted by pudge on 2001.11.04 18:22   Printer-friendly
redsquirrel sends in his tale of XML adventures:

My current project loomed over my cube like a foreboding fortress. My usually focused mind was easily distracted as I thrashed about, trying to come up with a "magical solution" to the overwhelming problem before me.

The problem: I work for a large organization (perhaps part of the problem :). We have numerous web applications that must have a company-wide look-and-feel. I must design a system that can manage all of these applications' HTML templates from a single location.

When I embarked on this quest for the "magical solution," I was advised to look into XML. Immediately, I was comforted by the security of knowing that other Perl hackers had walked this path before. I knew that if I could learn enough XML to get started, a visit to CPAN would surely provide me with the resources I needed.

Once I had a few XML documents under my belt, I headed off to CPAN to pick up some tools. XML::Parser was the first module I came across. Due to my laziness, though, I quickly became disheartened at the number of styles, handlers, and constructor arguments I had to deal with.

I continued down the aisle until I found XML::Simple. This was more like it! A few lines of code produced an XML document packed into a neatly organized Perl object, ready for manipulation! Ahh, laziness has its rewards. My triumph was short-lived. As I started doing some preliminary tests with XML::Simple, I learned about DTD's and "valid" XML documents. I wondered if XML::Simple validated the XML it so nicely parsed. The module lived up to its name and did not.

With a slightly more educated eye, another perusal of CPAN produced the "magical solution" I was looking for (and ironically brought me back to where I had started). XML::Checker::Parser extends XML::Parser, and this time, it didn't scare me quite so much. I held my laziness in check long enough to do some preliminary testing with the less "Simple" interface and was handsomely rewarded with the ability to create a "valid" XML Perl object in just 3 lines:

use XML::Checker::Parser;
my $cp = new XML::Checker::Parser(Style => 'Objects');
my $obj = $cp->parsefile($xml_doc);
Now, when I look upon the fortress that is this project, it does not feel as intimidating as it once was. My coding skills alone would be futile against such a task, but utilizing this module has given me access to the powerful skills of its authors: Enno Derksen (XML::Checker), Clark Cooper (XML::Parser v2.x), and Larry Wall (XML::Parser v1.0). Perl's power isn't found in any one individual's abilities, it is found in the collective strength of the multitude of CPAN authors.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Unfortunately XML::Checker::Parser is a bit buggy, and is known to not be a 100% correctly validating parser (I'm not sure what the fringe conditions are - it's changed developers recently, and hasn't been run through a validation suite to test it).

    As an alternative, try XML::LibXML's validating mode. It's based on libxml2 which is a validating parser, and known to be good and well tested. Yes, I'm biased, but I will admit there are some bugs in the validating code in XML::LibXML (not in libxml2).

  • You can't use the Objects style if you have an element named "Characters" or any element has an attribute named "Kids" as they will clash with the builtin naming convention.

    It's probably best to use something like XML::XPath which wraps around XML::Parser. This creates a tree of objects with no reserved word problems and also lets you wander around using XPath which is a W3C standard, quite nice to use and knowing it could help you in lots of non-Perl situations too.
  • Good to know. I will look into XML::SAX::Simple and XML::XPath. Sounds like they are superior solutions!
  • I just think I need to mention that XML::Simple, while easy to use for config files and generally data-oriented XML, cannot be used for XHTML: it does not process mixed-content (<p>this is <b>mixed</b> content</elt>: text and tags mixed).

    If you want to process that kind of XML you will have to use either XML::Parser itself (XML::Simple is based on XML::Parser) or XML::Parser::PerlSAX or XML::XPath, or XML::Parser::PurePerl (which would make Matt _really_ happy ;--) or (of course!)

  • I was recently tasked with taking invalid xml (XML that is so broken that it was not even valid XML. ie missing closing tags, etc etc etc...). No DTD, no schema, no namespaces. So the search for an XML validator began.

    The XML contained critical medical information which is itself our companies cash-cow. Anyway, this stuff was totally ****ed and we needed to fix it so we could improve our current and quite legacy, hypercard, editorial system.

    Yes, hypercard in 2006.

    Anyway, the point is, that afte