Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Not sure of any other XPath-on-streams freaks, let me know of any.

    Will take a look at that toolkit when next I dive in to EventPath implementation (see XML::Filter::Dispatcher [cpan.org] for that, those of you interested in streaming XPath implementations).

    Thanks for the pointer.

    - Barrie

    • Somehow I thought you would read this :) As for XPath-on-streams freaks, well, if it doesn't have to include implementer then you can count me in. You can also count the guy working next door from me that wants to do that kind of thing for our Publisher software. Then you have the STX guys as well. And prolly a number of other people, lo! we're a crowd, lets take over the world!

      It sure would be nice having an optimized matcher in XFD, especially one that can handle all of XPath on a stream.

      --

      -- Robin Berjon [berjon.com]

      • by barries (2159) on 2003.01.29 11:21 (#16495)
        STX isn't XPath; it's a matching language and designed for streaming environments. I've seen mentions of constructs that allow you to collect read-only source trees, but the matching is limited to what's handy without any buffering at all.

        EventPath is a superset of XPath which buffers events as necessary for in-order delivery in the event of possibly out-of-order matching expressions like /a[b]/c. And it gets most things correct :). This Xaos stuff might be a better way of implementing it, however; X::F::D uses continuations (well, really, more like queing anonymous subs for currying) to match against pending events given that the current event matched.

        So the leading "/" in "/a/b" will match against start_document and queue an anonymous sub to match against an start_element for <a>, which, when it matches, will in turn queue a sub to match a start_element for <b> in <a>'s child events, etc. All of this is acheived by compiling "/a/b" in to a large perl anonymous subroutine (I avoid closures due to all the leaks in older perls; may revisit that now) that gets run in the start_document() handler.

        This requires some overhead to run all those check subs and I'm looking forward to the day that I can optimize some of that to state machines and, possibly even less overhead, I want to optimize "a" => $action to be a simple "if $elt->{LocalName} eq 'a'" test in a compiled start_element sub. ETIME, but that would be as fast as handwritten SAX filter code.

        - Barrie

        • Oh, I remember STX as a subset of XPath, haven't looked that way in ages. Xaos immediately reminded me of XSD as what it does to match seems quite similar, notably related to what they call Total Matching (I think you do something quite similar, but I haven't thought that through yet). You could probably go indeed faster with a state machine, and I've been wondering if it's possible to see a stream as a b-tree as explained in Murata's [coverpages.org] (can't find the PPT, it contained graphics that made it a lot clearer)

          --

          -- Robin Berjon [berjon.com]