Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • That this isn't an XML parser. It is majorly broken in various ways. It's more of a tag-soup parser. I still think you should change the name of it.
    • "It is majorly broken in various ways" is not constructive criticism. Please try again. If you can find ways in which it is *actually* broken - ie, where it doesn't conform to the documentation, or the tests are inadequate - then I would prefer that you open a ticket using rt.cpan.org [cpan.org], although I will also accept submissions by email.

      As for "It's ... a tag-soup parser" - yes, well done, you spotted that it is a non-validating XML module. I know this may come as a shock, but such things are actually use

      • You're entirely missing the point. This is not about being a subset of the functionality or anything, this is not an XML parser. The problem is not about validation -- no one sane cares about that -- the problem is about well-formedness. There is a lot of perfectly good XML out there that this thing won't parse, and there is a lot of completely wrong XML that it won't flag as such.

        If you want to write and release a parser for "vague stuff that has angle brackets in it" and find it useful that's won

        --

        -- Robin Berjon [berjon.com]

        • Re: (Score:2, Insightful)

          CSS::Tiny [cpan.org] does not parse the entire CSS specification.

          YAML::Tiny [cpan.org] does not support the entire YAML specification.

          Config::Tiny [cpan.org] does not support some elements of the Windows .ini specification.

          Taken literally, CSS::Tiny is not a CSS parser, YAML::Tiny is not a YAML parser and Config::Tiny is not a Windows .ini parser.

          In exchange for sacrificing some completeness and correctness, ::Tiny modules provide a single small .pm file you can drop onto any Perl you are likely to find in existance anywhere, copied in by
          • As I said, if people find this sort of thing useful, let them have it, I have no issue with that. Just don't call it XML. It's not. NotXML::Tiny or TagParser::Tiny would be fine. Furthermore the documentation is extremely misleading in that it claims to support a subset of XML -- that is not the case. It would support a subset of XML if every document it understood could also be successfully parsed by an XML parser, but didn't support some things that an XML parser would report. As it is, it supports all sorts of things that an XML parser doesn't. The documentation's claim that it is a parser for an XML subset rather than a tag-soup parser is way off the mark.

            I don't have a problem with a Duck::Tiny that would sort of go quac instead of quack -- that happens all the time and still produces useful tools. There does exist a useful proper subset of XML -- this is definitely not it. The documentation takes some sort of populist moral higher ground about it implementing the useful stuff and being decried by a bunch of holier than thou pedants. That's all rather subjective chaff, as would be making the opposite claim in a similar fashion, so I decided to take it through a reality test. I gathered about 2000 XML documents that come from real world projects and are definitely not complex, not high-brow, do not require schemata or anything like that, etc. Mostly basic Web stuff like XHTML, SVG, XSLT, some configuration files, some XUL -- nothing fancy. It turns out, XML::Tiny does something useful with 83 of them. As it turns out, entities, PIs, and CDATA sections are not rarely used features. And handling charsets is very useful because doing it yourself is a bitch. Of course, of those 83 documents almost all use namespaces which XML::Tiny doesn't report, so it's not helping all that much.

            The fact is, in the real world, people use XML in many different ways. That is different for instance from CSS or .ini files where there does tend to be a very uniform common subset. A lot of smart and pragmatic people have tried to define a smaller XML, and failed. Claiming success is just as misleading as those people who regularly pop up claiming to have found radically new compression methods that can compress themselves ad infinitum.

            So if it keeps its name, at the very least the documentation should be a whole lot more explicit about its severe limitations. And furthermore, since David doesn't seem to care in the least about adherence to the spec, I really don't understand his resistance to calling it TagSoup::Tiny. Why use the "Sacred Trigraph 'XML'" as he calls it when it is in every way a tag-soup parser, and when everyone knows and agrees that tag-soup parsers are very useful? Banking on the holy trigraph? Pigheadedness?

            --

            -- Robin Berjon [berjon.com]

            • One excellent reason for calling it XML::Tiny rather than SomethingElse::Tiny which I've not mentioned yet is that that's what users will look for. Users who need to process simple XML documents will, obviously, look for XML stuff. If I were to call it SomethingElse::Tiny I wouldn't be able to help them. I suppose you could call that "banking on the holy trigraph" but given that I don't actually care whether anyone else uses it, it must be a very small bank.

              If you were to bother to tell me *what* it su

              • Although to be fair you did claim that you "decided to do XML right" and that might be why those that have "done XML right" got out of their pram.
                • I have to agree here. I can see the value of this module... if you have some tiny chunk of very simple XML, and you want something Tiny to just get it into a usable Perl structure, then sure this is a plausible solution. However, saying that you feel that you've done XML right and implemented (you feel) "the useful subset" of XML is obviously going to rile people who have dedicated time to actually implementing robust, if heavy, solutions. I tried the module, tossed an everyday XML file from my $work at
            • What sort of thing were the 83 it was useful for?