I managed to get the parse of large.xml (a 70K file) down from 9 seconds to about 7 or 8 seconds. Not a huge improvement - and I didn't feel my time was terribly well spent, at least until I tried bleadperl (5.7.2 current), where previously it had been significantly slower (about 17 seconds) and was now down to about 8 or 9 seconds (it will always be slower because it does many more unicode checks when running under a unicode capable perl). So that's good.
Well good is perhaps an overstatement, since libxml2's "xmllint" program takes 29ms to parse the same file. Ah well, I think it's time to stop worrying about parsing performance, and start thinking about full compliance instead.
Why does perl make for such a crappy parser?
Why? (Score:1)
It may be obvious, but... (Score:1)
C allows some optimizations where Perl allow them to occur in other places.
-- Godoy.
Re:It may be obvious, but... (Score:3, Interesting)
Re:It may be obvious, but... (Score:2)
My C sucks
Re:It may be obvious, but... (Score:2, Interesting)
Maybe someone needs to write a character-array manipulation class, a la PDL for huge matrix crunching. The class would gain a lot in efficiency for trading away the many capabilities Perl ordinarily gives. This would be something gross in XS, I'm sure.
Or maybe, if I'm thinking of writing a custom text-manipulation class for Perl, something's dreadfully wrong with the world. In much the same way that we always took XML::Parser's dependence on a C parser as an indication that something was wrong (and we
J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
Profiling (Score:1)
I'm presuming the answer is "Yes," but did you profile the code?
matts: "Yes, jdavidb, I profiled the code and discovered 80% of the processing occurs in statements like $c = substr($buf, 0, 1); Get off my case! :)"
J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
Re:Profiling (Score:2)
I'm going to post something to perlmonks including the profiling output and the heavy subs in question. Maybe someone there can help out.