Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I'd be curious to have a few precisions here. SAX is push-parsing (with the tiny extra that you can get a little context from the driver if it provides it), so I don't see how people could get that wrong ;-) As for pull-parsing, it is true that SAX does nothing for that. It should be too hard to come up with an API for pull-parsing, the trouble is mostly in agreeing on one. I guess that if someone presents a pull-parser system that is reasonable enough, it'll be adopted.

    I would think that people's


    -- Robin Berjon []

    • As for the part that interests me most, how many different interpretations of SAX did you get, and how did they differ?

      A notable point of difference was between people who considered the events to be all that the spec [] specified, and people who considered the spec to specify events and also parse(), parse_file(), parse_string(), and their behavior. If you read SAX as obliging one to follow the behavior of parse() et al in current parsers, then you can't easily implement something like HTML::TokeParser.


      • Ah thanks for the precision. That is an area that I would never have considered grey, but then I have my nose right inside it all the time, and a lot of the talk that define(s|d) SAX isn't archived as it happened on IRC.

        Here's an attempted short breakdown of the general idea (as clear as I can make it past 4am). The parse calls (parse, parse_file etc.) are part of the SAX spec, and there's a very good reason for this and for why I very much doubt that it'll change. SAX drivers are meant to be totally interchangeable (modulo Features, which are queriable). That's why there's such a thing as XML::SAX::ParserFactory: you ask for a parser, get it, and interact with it without needing to know which parser it is. A bit like DBI after a fashion.

        Because of that, the parse calls need to be clearly specified, and consistent accross parsers.

        Now, that does not mean that there isn't a (conceptual) separation between those parts of the interface, and the content of what the events receive. The contents are consistent accross all events as you can see, and in fact are meant to be similar in PerlDOM. The only reason why they are presently spec'd in PerlSAX only is because that's the only spec that's mature enough to be spoken of. If someone would come up with a pull-parser spec that would reuse those data-items, then it might make sense to have a common spec for the data bits, and separate specs for the interfaces.

        As for grafting a pull-parser API on top of the SAX API, I think that's a bad idea. SAX is push by essence, and it makes it simpler that way. This means that a pull-parser API would live in a different spec, and be implement most of the time by different modules. Of course, it doesn't mean that they shouldn't share the same node representations -- much to the contrary -- but the two should be separated so that no one expects parsers from one side to be also able to behave as parsers from the other side. That would make some of them very hard and others potentially impossible.

        People have been saying that a pull-parser equivalent of SAX would be very cool for almost as long as SAX has existed. However I was talking with Kip about that earlier and despite the fact that we all agree it would be nice to have, no one appears to have been itched enough to write one.

        And that's why I'm going to disappoint you and leave you with a "Well, it's possible..." I have no need whatsoever for a pull-parser, SAX fills all of my parsing needs. I agree that a pull-parser would be nice, and should someone endeavour to write one I will help with what I can. Hey, TokeParser has so simple an API that doing the same thing for XML, reusing the node representations from SAX, ought to be a no-brainer (provided one has an underlying parser that supports that type of interaction).

        I, like many others, simply haven't felt the need for that bad enough to scratch the itch so far. Someone deciding to implement such a parser would, however, get lots of support from us folks that are happy as it is with SAX ;-)

        As for HTML::Parser, I know it well and have used it intensively until I found out about XML (in fact I used to use it as an XML parser, more or less). Is there anything from it that you think would be useful to the SAX space? All I can think of off the top of my head is the ability to specify parameter templates but I've always found that that was more of a hassle (even after writing many programs that use it I still need to refer to the docs) than anything else. However, if you really need that feature, it could easily be done as a filter. I won't code that myself though, so here's another "Well, you could..." ;-) Hey, don't blame me. I code SAX every day and have "look, here's what I've done" SAX stuff every day. I can't possibly implement all the SAX-related ideas I hear about ;-) I can, however, promise to support with what means I have anyone that wishes to make an idea SAXy.


        -- Robin Berjon []