Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Friday September 30, 2005
03:08 PM

Yet *another* XML Module?

[ #26952 ]

See also the Perlmonks posting.

I can't find the link right now, but Joel Spolsky once wrote about core business needs. If you need something to be done perfectly and it's a core function of your business, do it yourself. Don't outsource it or use someone else's code. While I am not that strident (I wouldn't dream of writing an alternative to DBI), I have to confess that the slough of XML modules out there are often painful to use. Just painful.

I know many would rather not see another module in the XML namespace, but I've started putting together a proof of concept called XML::Composer (it was originally called XML::JFDI). It will let you write really, really bad XML. It will also allow you to write XML "variants" like the awful Yahoo! IDIF format. Namespaces? It doesn't know about 'em or care about 'em. If you want to use a colon in an attribute name, go ahead. Want to inject an XML snippet from somewhere else? Go ahead. It will simply do whatever you tell it to do.

The primary idea is to be able to write XML in just about any format you want. Frequently this means that it will cheerfully produce bad XML, but rather than spend hours scouring the docs only to find yourself reporting another bug, you can quickly and easily tweak what you want.

use XML::Composer;

my $xml = XML::Composer->new({
    # tag         method
    'ns:foo'  => 'foo',
    'bar'     => 'bar',
    'ns2:baz' => 'baz',
$xml->Decl; # add declaration (optional)
$xml->PI('xml-stylesheet', {type => 'text/xsl', href="$xslt_url"});
    { id => 3, 'xmlns:ns2' => $url},
    $xml->Comment('this is a > comment'),
    $xml->Raw('<bad tag>');
    $xml->baz({'asdf:some_attr' => 'value'}, 'whee!'),
print $xml->Out;
if ($xml->Validate) { # false for above example
<?xml version="1.0">
<?xml-stylesheet type="text/xsl" href="$xslt_url"?>
<ns:foo id="3" xmlns:ns2="$url">
  <!-- this is a &gt; comment -->
  <bad tag>
  <ns2:baz asdf:some_attr="value">whee!</ns2:baz>

This module (not yet on the CPAN) assumes that you, the programmer, know what you're doing. It's also designed for agile development: produce something quickly but have a test suite to verify that your output is correct. In short, there is a very specific design philosophy here and I wouldn't recommend it for those who don't have test suites, but it makes coding very fast (and no, it doesn't use AUTOLOAD to build those methods).

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • If it's not well formed, it's not XML. Period. Please don't encourage such vileness into the world.

    At least, if you do, don't put it into the XML namespace because that's not what it's producing.

    Sorry to come across mad about this, but dealing with invalid XML (and worse, SGML) over the last 5 years has made me bitter and twisted.

    For your well formed XML generation needs, I'm open to suggestions as to how I can improve XML::Genx [] (within the constraints of the underlying library).


    • You really want to see the Perlmonks thread on this (I linked to it in the parent story). XML::Composer can easily produce valid XML and, to be honest, it produces XML much easier than most of the XML modules out there. Rather than slapping my hand when I use a namespace which is illegal or upper-case tags or improperly escaped data (shudder), it trusts me to really mean what I say. However, the fact is that we often have to deal with bad XML and there's no way around it. I hate it. You hate it. We ha

      • sigh. I certainly see the need. It just makes me cry. :-)

        And I apologise; I should have read the perlmonks thread first.

        As to the name, how about XML::NotWellFormed? It's the most accurate description even if it is an oxymoron.


      • I still don’t like the idea, even as I understand the predicament. I would suggest you use a templating system instead of writing a module for this. Text:Template and the Template Toolkit can easily produce arbitrarily complex and arbitrarily broken XML output.

        (Oh, and please get in touch with the people who’re asking broken XML from you and call them bozos []. Not offensively, of course.)

        • Believe me, I already had a phone call with a Yahoo! rep. He was very apologetic but there's not much I can do as a lone developer to shove Yahoo!

  • A friend and I had a similar discussion about generating XML in a simple way today -- we trawled CPAN and found XML::Generator []. Does it do (most of) what you're looking for?

    • I had looked at XML::Generator and I liked it, but it had some problems. First, because of autoload, it's easy to do this:

      print $xml->feild('foobared');

      I probably meant "field". My version forces you to map methods to tags and will die if you try to print a tag that doesn't exist (though you can add methods/tags on the fly).

      Also, as far as I can tell, I would not be able to conveniently dump out data in the Yahoo! IDIF format. Here's an snippet:

      <?xml version="1.0"?>

      • I probably meant “field”. My version forces you to map methods to tags and will die if you try to print a tag that doesn’t exist (though you can add methods/tags on the fly).

        What if you wrote the correct tag name, but it gets inserted in the wrong place? What if the content is bad or a required attribute is missing?

        Of course, a schema can only be used to validate well-formed documents…

      • This is not XML, so really it should not be labelled as such. AFAICT it is pretty close to being SGML though. It might even be SGML, the XML declaration would be seen by an SGML parser as a regular PI, you can add a DTD when you parse the file, and unenclosed tags and &\W are valid in SGML. You just need the characters to be in latin1.

        So why don't you go and pollute the SGML namespace ;--)

        Seriously, I don't think you should release your module in the XML namespace. An IDIF module would be OK, it its o

        • I have decided not to put this in the XML namespace. That much I agree on.

          I think my main problem with your module is the way you seem to advertise it, which sounds a bit like "let's generate more quasi XML to p.o. real XML guys" to me.

          I can see how that might appear to be what I was doing. I'll be sure to clarify that. Unfortunately, this is a real problem space that developers constantly face: legacy XML "variants" or third-party resources which require malformed XML. Since it's not always possib

          • down to the ordering of attributes

            I feel your pain, having had to deal with exactly that request for XML::Twig (not a model of XML purity itself): apparently some Microsoft tool needs attributes in a specific order. The easiest solution I found was to use Tie::IxHash objects to store the attributes.

      • Also, as far as I can tell, I would not be able to conveniently dump out data in the Yahoo! IDIF format. Here's an snippet:

        That is not even remotely well-formed XML (note the unclosed "br" tags, for example), but it's perfectly valid for Yahoo!'s IDIF format. Trying to produce a bunch of stuff like this with most XML modules is what finally led me to start writing my own. From what I can see, XML::Generator will not allow me to write that.

        I'm sure the usefulness of this information is long gone for you,

  • In Defense of Not-Invented-Here Syndrome []

    Of course, Joel is fun to read because he writes a lot more confrontatively than the subject really warrants. I wrote something related [] recently.

  • What we need is less bad not-quite-XML and more good XML. For generating good XML, the libaries that guarantee producing correct XML are the way to go. My impression is that they are easier to use than the not-so-nice ones because you don't have to worry about screwing up. They do things like always encoding strings, always using UTF-8, always worrying about namespace, and complaining about improperly nested tags.

    If you have to generate not-quite-XML or bad XML, then you have a bigger problem. I am no

    • But there's plenty of bad XML out there already and there are programmers who have no choice but to implement it, particularly if it's a third party requirement. Usually this bad XML tends to cause plenty of problems. Why have even more problems by creating yet another hand-rolled module which may or may not do what you want? Programmers in this unfortunate situation should at least be able to get the job done and not waste time having to reimplement something.

      The good thing about Data::XML::Variant [] is