Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

drhyde (1683)

drhyde
  (email not shown publicly)
http://www.cantrell.org.uk/david

Journal of drhyde (1683)

Monday January 29, 2007
05:20 AM

XML::Tiny

[ #32269 ]
I have been continually frustrated by XML modules. They were all either hard to use, had ridiculous dependencies, took too much memory, or some horrible Lovecraftian combination of those. So, when Adam Kennedy recently muttered on the datetime mailing list about his *::Tiny modules, I decided to do XML right. The result is XML::Tiny, which (according to its documentation) implements a useful subset of XML. Secretly, I think it implements the useful subset. The core parser is less than 20 lines of code and is sufficient to parse an RSS feed or the responses from Amazon's web services. It should be compatible with perl 5.004_05 and with XML::Parser with the XML::Parser::EasyTree style, has no dependencies outside the core, and consumes as near as damnit no memory.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • That this isn't an XML parser. It is majorly broken in various ways. It's more of a tag-soup parser. I still think you should change the name of it.
    • You're right. It should be called "XML::Retarded", and come with a special helmet that its users should have to wear for if/when they walk into a door and go bonk and cry.

      And what's this guy going on about the other modules having too many dependencies? XML::Parser::Lite has none, last I looked. And wasn't RETARDED.

        • His unescaping logic is daffy, and his attribute syntax rejects half the XML that I write in the course of a day. But then, he could always say "I meant to do that!".

          He (or you!) should just grab XML::Parser::Lite and release it as a standalone module. The fact that XML::Parser::Lite is currently available only in that SOAP dist is a great big mistake-- and a mistake which bears fixing, which is more than I can say for this XML::Tiny mess.

          • "His unescaping logic is daffy" is not constructive criticism. Please try again.

            "His attribute syntax rejects half the XML that I write in the course of a day" is not constructive criticism. Please try again.

            That is, please try again if you expect me to pay any attention.

            • The spec is right there. It's easy to read. It's even pretty easy to write a valid parser for it (I wrote XML::SAX::PurePerl in a couple of days, although it is admittedly lacking in a few areas, but it tries). There's even a bunch of fairly good test suites out there for XML. Try James Clark's stuff for a starter. Or fire the stuff in XML::SAX::PurePerl at it. People don't have time to list all the XML parsing bugs out for you when it purposely avoids being an XML parser. But if you're going to call it an
              • I was careful to list the ways in which it is not compliant with the spec. If despite that people expect it to comply with the spec then they really are far too stupid for me to care about.

                I would note that despite the doom-saying XML fanboys in this thread, I have had more feedback from people saying "thanks for this really useful module" than I have for any other module I have released. Funny, that. Y'know, I think that because I'm such a nice person, I'll continue in my evil ways and keep releasing

                • I was careful to list the ways in which it is not compliant with the spec.

                  Your module's key point of differentiation seems to be that it does not, in fact, comply with the XML spec. Why then are you surprised that people don't think it belongs in CPAN's 'XML' namespace?

                  I'm not sure that referring to people as "XML Fanboys" is helpful either.

                  • On the contrary, its key point of differentiation is that it implements a significant chunk of the XML spec - enough of the spec to be very useful - while imposing minimal burdens on the user.

                    There's an awful lot of things in life and on the CPAN which, like this module, are sufficiently complete to be useful without being perfect. Paracetamol is one example. It is sold as a pain reliever, despite the fact that under some circumstances it will rot the liver, which I am sure is not a particularly pleasan

          • Re: (Score:2, Informative)

            The fact that XML::Parser::Lite is currently available only in that SOAP dist is a great big mistake-- and a mistake which bears fixing

            Yes. I've just packaged it up and uploaded it to CPAN, and I'm currently waiting to hear back from the authors about getting co-maint permissions.

    • "It is majorly broken in various ways" is not constructive criticism. Please try again. If you can find ways in which it is *actually* broken - ie, where it doesn't conform to the documentation, or the tests are inadequate - then I would prefer that you open a ticket using rt.cpan.org [cpan.org], although I will also accept submissions by email.

      As for "It's ... a tag-soup parser" - yes, well done, you spotted that it is a non-validating XML module. I know this may come as a shock, but such things are actually use

      • You're entirely missing the point. This is not about being a subset of the functionality or anything, this is not an XML parser. The problem is not about validation -- no one sane cares about that -- the problem is about well-formedness. There is a lot of perfectly good XML out there that this thing won't parse, and there is a lot of completely wrong XML that it won't flag as such.

        If you want to write and release a parser for "vague stuff that has angle brackets in it" and find it useful that's won

        --

        -- Robin Berjon [berjon.com]

        • Re: (Score:2, Insightful)

          CSS::Tiny [cpan.org] does not parse the entire CSS specification.

          YAML::Tiny [cpan.org] does not support the entire YAML specification.

          Config::Tiny [cpan.org] does not support some elements of the Windows .ini specification.

          Taken literally, CSS::Tiny is not a CSS parser, YAML::Tiny is not a YAML parser and Config::Tiny is not a Windows .ini parser.

          In exchange for sacrificing some completeness and correctness, ::Tiny modules provide a single small .pm file you can drop onto any Perl you are likely to find in existance anywhere, copied in by
          • As I said, if people find this sort of thing useful, let them have it, I have no issue with that. Just don't call it XML. It's not. NotXML::Tiny or TagParser::Tiny would be fine. Furthermore the documentation is extremely misleading in that it claims to support a subset of XML -- that is not the case. It would support a subset of XML if every document it understood could also be successfully parsed by an XML parser, but didn't support some things that an XML parser would report. As it is, it supports all

            --

            -- Robin Berjon [berjon.com]

            • One excellent reason for calling it XML::Tiny rather than SomethingElse::Tiny which I've not mentioned yet is that that's what users will look for. Users who need to process simple XML documents will, obviously, look for XML stuff. If I were to call it SomethingElse::Tiny I wouldn't be able to help them. I suppose you could call that "banking on the holy trigraph" but given that I don't actually care whether anyone else uses it, it must be a very small bank.

              If you were to bother to tell me *what* it su

              • Although to be fair you did claim that you "decided to do XML right" and that might be why those that have "done XML right" got out of their pram.
            • Nah, what they're complaining about is that they think users are too stupid to read the documentation before using the software, and so won't be aware of its limitations.

              Personally, I have a considerably higher opinion of the users than that.

              • I don't think that users are too stupid to read the documentation, but I do think that they will tend to believe it when they read it, and unfortunately it is misleading, as explained in http://use.perl.org/comments.pl?sid=34409&cid=52943 [perl.org], and will waste their time. That's not playing nice.

                --

                -- Robin Berjon [berjon.com]

                  • At first glance, a great number of those failures seem to be to do with broken entity declarations which, of course, I don't spot. I'll certainly take a look though and see if there's any quick wins to be had. Thanks!
        • One of the areas in which I *am* compliant with the spec is that I decode those entities that I know about. Without supporting those are &, >, <, ", and '. To not decode those would be an error. Sure, getting rid of that would please you, but only at the expense of having to put up with a different set of ingrates whining about it.
          • Haha, amusingly this site doesn't escape certain characters, those making what I wrote utterly unreadable ;-)
        • Because I'm such a nice person, and because you asked so nicely (oh, wait, you didn't) I just added support for CDATA.
      • That's like saying Ruby is "more than one way to do Perl". No, it's ruby, not Perl. There are parsers for Ruby that work just fine, but don't call them Perl parsers.
  • ...and consumes as near as damnit no memory.

    Well, at least until parsefile() slurps that 20MB XML file anyway ;-)
    • Yeah, someone else spotted that too - fixed in 1.01, which will hit the CPAN shortly.
      • The synopsis shows:

            my $document = parsefile($xmlfile);

        But it doesn't tell me what $document looks like. Am I going to call methods on it to get what's in it? Is it a bunch of hashes of arrays of hashes? How do I use it?

        I'm sure that's answered later in the docs, but to me the synopsis ought to demonstrate enough of the API to give me the flavor of it so I can decide if I want to spend the time reading the docs to learn the module.

        --
        J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers