Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

miyagawa (1653)

miyagawa
  (email not shown publicly)
http://bulknews.vox.com/
AOL IM: bulknews (Add Buddy, Send Message)

Journal of miyagawa (1653)

Sunday November 25, 2007
06:37 PM

Web::Scraper (HTML::TreeBuilder::XPath) slowdown on Fedora

[ #34970 ]

Today I had an interesting report from Web::Scraper user, saying that he has a script that runs really quick (less than 1 sec) on Macbook but so slow (50 secs) on AMD dual CPU machine. Here's the dprof report:


Total Elapsed Time = 47.32165 Seconds
    User+System Time = 31.07165 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
  51.6 16.03 16.033 6922 0.0023 0.0023 XML::XPathEngine::NodeSet::new
  13.5 4.208 4.208 1777 0.0024 0.0024 XML::XPathEngine::Boolean::True
  13.0 4.048 4.048 1723 0.0023 0.0023 XML::XPathEngine::Literal::new
  11.3 3.518 3.518 1666 0.0021 0.0021 XML::XPathEngine::Boolean::False

We initially thought it's due to some XS module library issues with dual CPU, but it turned out he was using perl that comes with Fedora, and the rpm version he uses is 5.8.8-10.

As addressed in RH/Fedora bugzilla, perl 5.8.8 rpm prior to 5.8.8-22 has a nasty patch that makes all perl's new() (or bless) call in classes with overloaded methods really slow. HTML::TreeBuilder::XPath (hence Web::Scraper) creates a lot of Nodes on HTML pages and XML::XPathEngine::NodeSet definitely has an overloaded function.

So this is really due to Fedora Perl's patch. If you run into the same issue with Fedora, check your rpm version and upgrade to the latest, or build your own perl which is always a good thing.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.