Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

mir (51)



Journal of mir (51)

Wednesday May 07, 2008
11:15 AM


[ #36350 ]

An other reason why subroutine should always end with an explicit return.

XML::Twig has this XML::Parser handler for characters that ends (or rather used to end!) with $elt->{pcdata} .= $string;. This is the common case, but it's hidden in the middle of a few if/else's, and it's actually an inlined method call. So it's not obvious what's going on there. But this was what happened when the XML to be parsed included a 4Mb, 60K line, base-64 encoded element: the handler was called 120K times (once for each line, once for each line return). For each of those calls the current content of the element was returned by the handler, and promptly discarded in the bowels of XML::Parser. Except that if you count 120 000 * 4MB/2, that makes nearly 500 GB of memory that needed to be allocated, copied and discarded, for absolutley no good reason at all.

In the end, adding a return at the end of the handler took processing time from 581s to... 2s. It probably improves speed in less specific cases.

And yes, it is one of Perl Best Practices recommendations (although not for performance reasons). So were was the PBP in 1997 when I wrote the first version of XML::Twig?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • This might be a failure of the Perl core, though. Maybe it should detect a return value in void context and just skip it.
    • The return value of the handler is probably handled in C by XML::Parser, so it might be an XML::Parser problem. Or the interaction between the Perl core and XML::Parser... or maybe an expat problem. I don't really know at this point, I am just happy I found a solution.
      • So does this mean that XML::Twig will be getting significantly faster now?
        • So does this mean that XML::Twig will be getting significantly faster now?
          I hoped so, but the tests I have ran don't show any improvement. It looks like "big element split in many lines" is really a corner case, and that in general not returning anything from the handler doesn't help that much. :--(