Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of LTjake (4001)

Friday April 01, 2005
11:03 AM

a harmless sort?

[ #23972 ]

tholbroo had mentioned to me that if I'm doing a regular sort on Class::DBI objects, I can save a lot of time by doing a Schwartzian transform. I didn't think that there would be much of a penalty on small sets, but I decided to do some benchmarking (using the CPANTS DB):

package CPANTS::Distro::Kwalitee;

use strict;
use warnings;

use base qw( Class::DBI );

__PACKAGE__->connection( 'dbi:SQLite:dbname=cpants.db' );
__PACKAGE__->table( 'kwalitee' );
__PACKAGE__->columns(
    All => qw(
        distid
        kwalitee
        extractable
        extracts_nicely
        has_buildtool
        has_manifest
        has_meta_yml
        has_proper_version
        has_readme
        has_test_pod
        has_test_pod_coverage
        has_tests
        has_version
        is_prereq
        no_cpants_errors
        no_pod_errors
        no_symlinks
        proper_libs
        use_strict
    )
);

package main;

use strict;
use warnings;

use Benchmark qw( cmpthese );

my @distros = CPANTS::Distro::Kwalitee->retrieve_all;
@distros    = @distros[ 0..99 ];

printf( "Sorting %d rows...\n", scalar @distros );

cmpthese(
    1000,
    {
        regular_sort  => \&regular_sort,
        schwartz_sort => \&schwartz_sort
    }
);

sub regular_sort {
    @distros = sort {
        $a->kwalitee <=> $b->kwalitee
    } @distros;
}

sub schwartz_sort {
    @distros = map {
        $_->[ 0 ]
    } sort {
        $a->[ 1 ] <=> $b->[ 1 ]
    } map {
        [ $_, $_->kwalitee ]
    } @distros;
}

__END__
C:\cdbibench>perl bench.pl
Sorting 100 rows...
               Rate  regular_sort schwartz_sort
regular_sort  102/s            --          -47%
schwartz_sort 195/s           90%            --

Interesting. tholbroo++

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • If the expense of re-computing the sort key exceeds the expense of caching the sort key, then the ST wins. You merely need a "long enough" list and an "expensive enough" sort key computation.
    --
    • Randal L. Schwartz
    • Stonehenge
  • Sort your data in the database. This will be faster. It also scales better when you just want part of the result set, because you can use LIMIT (or whatever your DB supports).
    • Yeah, what autarch said.

      This is why gave up Class::DBI – it very nearly forces you to do on the Perl side what should be done at the SQL level because it makes it way too difficult to do the latter. First you run into a wall of no documentation, then you often have to put up with annoying verbiage. After reading the source for 15 minutes, I think the following will do what you need:

      CPANTS::Distro::Kwalitee->set_sql(
          all_by_kwalitee => 'SELECT __ESSENTIAL__ FROM __TABLE__ ORDER B

  • Hi,

    You can try Sort::Key that I released some days ago on CPAN, its like the Schwartzian transform, but implemented in C so a bit faster.