Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • by darobin (1316) on 2002.01.21 19:21 (#3483) Homepage Journal

    I'd be curious to have a few precisions here. SAX is push-parsing (with the tiny extra that you can get a little context from the driver if it provides it), so I don't see how people could get that wrong ;-) As for pull-parsing, it is true that SAX does nothing for that. It should be too hard to come up with an API for pull-parsing, the trouble is mostly in agreeing on one. I guess that if someone presents a pull-parser system that is reasonable enough, it'll be adopted.

    I would think that people's misunderstandings come mostly from thinking that SAX is an XML-only API. You can read anything into SAX, and write SAX out to anything. It just so happens that it was created for XML, and that the intervening data maps to a tree structure (seen token by token) that is coherent with that of XML. I also think that the problem is that people that have experience in other data formats misunderstand XML because they expect XML to "do something" just like that, as if it were a dedicated syntax. It's just glue for data. You're rarely do something interesting with glue alone (glue sculptures? ;-).

    As for the part that interests me most, how many different interpretations of SAX did you get, and how did they differ? There are a few bugs and inconsistencies in parsers here and there, and some parts of the spec are not implemented anywhere (though this is steadily progressing). Apart from that, the SAX model is rather simple. Events have a defined order, and defined content. I don't think that the situation is comparable to that of POD, not by a fair margin. At least, not that I've noticed while using any SAX2 drivers. Differences were filed as bugs, and flattened out.

    I'm pretty much in the shoes of the guy that edits the PerlSAX2 spec and I have strong interest in seeing everyone happy with SAX I'd be delighted to have a clearer description of points on which you think the spec is unclear, or allows for differing interpretations. Until there is at least one parser that is complete (which should happen soon now that Matt has produced locator code) the spec is considered to be in minimal flux. By minimal I mean no big change can happen, but minor fixes can. In some cases, this includes things as simple as specifying that data item Foo, when it doesn't exist, must be '' and not undef. Nothing big, but important precisions if we are to put everyone on the same page :-)

    Knowing what you think is wrong in there would definitely help.


    -- Robin Berjon []

    • As for the part that interests me most, how many different interpretations of SAX did you get, and how did they differ?

      A notable point of difference was between people who considered the events to be all that the spec [] specified, and people who considered the spec to specify events and also parse(), parse_file(), parse_string(), and their behavior. If you read SAX as obliging one to follow the behavior of parse() et al in current parsers, then you can't easily implement something like HTML::TokeParser.


      • Ah thanks for the precision. That is an area that I would never have considered grey, but then I have my nose right inside it all the time, and a lot of the talk that define(s|d) SAX isn't archived as it happened on IRC.

        Here's an attempted short breakdown of the general idea (as clear as I can make it past 4am). The parse calls (parse, parse_file etc.) are part of the SAX spec, and there's a very good reason for this and for why I very much doubt that it'll change. SAX drivers are meant to be total


        -- Robin Berjon []