Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Matts (1087)

  (email not shown publicly)

I work for MessageLabs [] in Toronto, ON, Canada. I write spam filters, MTA software, high performance network software, string matching algorithms, and other cool stuff mostly in Perl and C.

Journal of Matts (1087)

Saturday August 04, 2001
08:27 PM

Random thoughts

[ #612 ]

A couple of late night random thoughts of things I'd like to get implemented:

- A SAX filter that adds namespace capabilities to SAX1 parsing. This would be pretty simple, and require a few trivial changes to SAX1toSAX2.

- SAX stuff added to XML::LibXML, but serialised from the DOM tree, not directly from libxml2's stream parser (for reasons I've discussed before elsewhere).

- A SAX filter that only passes on events that match certain XSLT match directives (which are sort of like a minimal XPath syntax).

- An article about how I implemented the XPath parse in XML::XPath. This is pretty neat, as it doesn't require an external parser library (like RecDescent), and talks about left factoring grammars (or at least would talk about that). I'd write that for TPJ but they don't pay very much, and I'm a mercenary bastard :-)

- A pure Perl SAX parser. Maybe based on a left-factored grammar using the techniques used in XML::XPath. Might be hard though, and I'd need *serious* funding (I wonder if Morgan Stanley are listening :-)

- A better charset detector than Apache::MimeXML, which appears to actually be borked. I'd also like to be able to parse a character at a time up to the end of the xml declaration, and then switch encodings (using binmode in 5.8). This would be the basis of the parser above.