Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

schwern (1528)

  (email not shown publicly)
AOL IM: MichaelSchwern (Add Buddy, Send Message)

Schwern can destroy CPAN at his whim.

Journal of schwern (1528)

Monday October 20, 2003
08:33 PM

We need a *really* fast find library.

[ #15301 ]

On OS X there's a little package management tool called fink. Its no Debian, but its ok. Listing the available packages takes a few seconds longer than I liked on my iBook so I decided to do a little profiling to figure out what was being slow.

Turns out it was two things. One was their version comparision function. It was repeating over 90% of its calls so I memoized it. That knocked out about 25% of the runtime.

The other was a function that was simply looking for new .info files in a big directory tree to see if the package index cache had to be updated. With the version comp optimized this was sucking down about 35% of the runtime. It was just a simple recursive directory crawl. read a directory, recurse into directories, look for a new .info file.

sub search_comparedb {
                my $path = shift;
                my (@files, $file, $fullpath, @stats);

                # FIXME: should probably just check dirs of $config->get_treelist()
                opendir(DIR, $path) || die "can't opendir $path: $!";
                @files = grep { !/^[\.#]/ } readdir(DIR);
                closedir DIR;

                foreach $file (@files) {
                                $fullpath = "$path/$file";

                                if (-d $fullpath) {
                                                next if $file eq "binary-$debarch";
                                                next if $file eq "CVS";
                                                return 1 if (&search_comparedb($fullpath));
                                else {
                                                next if !(substr($file, length($file)-5) eq ".info");
                                                @stats = stat($fullpath);
                                                return 1 if ($stats[9] > $db_mtime);

                return 0;

Pretty straight forward. A few small touch ups could be done, but basically ok code. And it was sucking down a third of the runtime. This is over directory trees containing 2700 files and about 50 non-ignored directories. Just 50 calls to this function ate nearly a second or runtime on my iBook.

I could have tried unrolling the recursion, but I never really learned how to do that (I'd be interested to be shown how its done). Anyhow, I did the simple thing:

sub search_comparedb {
                my $path = shift;
                $path .= "/"; # forces find to follow the symlink

                # Using find is much faster than doing it in Perl
                    (grep !m{/(CVS|binary-$debarch)/},
                      `/usr/bin/find $path -type f -name '*.info' -newer $basepath/var/db/fink.db`)
                          ? 1 : 0;

find. We can get away with this because the code only runs on OS X where /usr/bin/find always exists. How much faster was this? So much faster that the subroutine completely fell off the profiling radar.

Benchmark: timing 100 iterations of find, orig...
            find: 9 wallclock secs ( 0.19 usr 0.00 sys + 1.34 cusr 4.58 csys = 6.11 CPU) @ 526.32/s (n=100)
            orig: 49 wallclock secs (21.28 usr + 0.00 sys = 21.28 CPU) @ 4.70/s (n=100)

God damn. Some sort of Perl module wrapper around a find C library.

What was the next bottleneck? Oddly enough, Storable::retrieve. Its the top time killer now gulping down 40% of the runtime loading up that index file we just checked to see if it needed updating. But I'd already nearly doubled the performance so I stopped there.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Not sure whether MacOSX has a locate utility. If so, maybe this []could be useful (can't be installed via the CPAN module however, since an ancient 0.50 version by another author existed at some time).

    It wraps the C code from the GNU findutils into some XS to natively query an existing locatedb.
    • Not sure whether MacOSX has a locate utility.
      It does.
      It wraps the C code from the GNU findutils into some XS to natively query an existing locatedb.
      Nice, but fraught with all the problems of locate. ie. It'll be out of date. Though maybe I can crib some code out of it.
  • I bet you could do some "! -path '*/CVS/* -and ! -path 'debian-$binarch -and ...'" kung-fu to eliminate the Perl grep and make it even faster.