Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Matts (1087)

Matts
  (email not shown publicly)

I work for MessageLabs [messagelabs.com] in Toronto, ON, Canada. I write spam filters, MTA software, high performance network software, string matching algorithms, and other cool stuff mostly in Perl and C.

Journal of Matts (1087)

Thursday January 31, 2002
06:00 AM

Improving parsing speed

[ #2548 ]

I've been hacking yesterday on trying to up the performance of XML::SAX::PurePerl (an XML parser written just in perl). It's a hand-built recursive descent parser, and it's reasonably fast, but orders of magnitude slower than any C parser (and probably will always be that way).

Most of the time is spent in the matcher and backtracking routines (for obvious reasons), which simply say "Does the current token (which is just a character) match this character, or this regexp" I guess most of the time is spent there since it's called a zillion times, so I'm not entirely sure if I can reasonably expect to make it that much faster.

My current problem is that I'm working from home today (err, ok so I'm really just avoiding working) waiting for DynaRod to come and clean my drains. But here I've only got 5.00503, and for some reason Devel::DProf is segfaulting for me when trying to test the parser, so right now I'm installing bleadperl to run it under that instead. I could use one of my laptops, but I want to get bleadperl running here anyway. Once I've got DProf running I can be a bit more categoric about where the slow stuff is.

Oh, also did you know that / [range1] | [range2] | [range3] .../ is much slower than / [range1range2range3...] /. Strange that. The major downside though is that character classes can't have spaces in them even under /x without the space being treated as an actual space. This makes the XML Char regexp awful, because it contains a massive number of different character classes. Ah well.

I also emailed my dad about the electrics, but got no reply so far, so I guess I'm going to have to call an electrician. Damn :-(

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.