I managed to get the parse of large.xml (a 70K file) down from 9 seconds to about 7 or 8 seconds. Not a huge improvement - and I didn't feel my time was terribly well spent, at least until I tried bleadperl (5.7.2 current), where previously it had been significantly slower (about 17 seconds) and was now down to about 8 or 9 seconds (it will always be slower because it does many more unicode checks when running under a unicode capable perl). So that's good.
Well good is perhaps an overstatement, since libxml2's "xmllint" program takes 29ms to parse the same file. Ah well, I think it's time to stop worrying about parsing performance, and start thinking about full compliance instead.
Why does perl make for such a crappy parser?