Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

pmichaud (6013)

pmichaud
  (email not shown publicly)
http://www.pmichaud.com/

Patrick Michaud is the pumpking for the Rakudo Perl 6 compiler. He holds a Ph.D. in Computer Science and was formerly a Professor of Computer Science at Texas A&M University-Corpus Christi. He is currently a software developer and consultant focused on open source development and applications, including Perl, PmWiki, and Linux.

Journal of pmichaud (6013)

Friday December 12, 2008
12:34 PM

It's sort of like...

[ #38080 ]

Earlier in the week, fw wrote a journal post about his First Perl6 program. I was quite excited to see this, but then read a comment from educated_foo that pointed out that the Perl 5 version is actually shorter to write.

That didn't sit too well with me, because we really ought to make common things simpler, not harder. What really bugged me (about Perl 6, not about fw's post) was the following for sorting a hash by values:

for %words.pairs.sort: { $^b.value <=> $^a.value } -> $pair {
    say $pair
}

Sorting hashes by value is a common operation, and although I can shorten the above a little bit

.say for %words.sort: { $^b.value <=> $^a.value };

it's still a bit long for my taste. That { $^b.value <=> $^a.value }
just bugs me.

Then Moritz Lenz made a comment on #perl6 that perhaps sort should do something different with arity-1 blocks, and I then had an epiphany that led to the following pattern for sorting hashes by value:

%hash.sort: { .value }

I like this so much, I've gone ahead and implemented it in Rakudo, even though it's not officially part of the spec. (I'm hoping it'll be adopted as such.) So now in Rakudo we have the following:

> my %hash = { c=>5, b=>7, a=>-4, e=>9, d=>0 };
> .say for %hash.sort: { .value }
a       -4
d       0
c       5
b       7
e       9
 
> .say for %hash.sort: { .key }
a       -4
b       7
c       5
d       0
e       9

That seems much nicer. The general principal is that if the comparison argument to "sort" takes less than two arguments, then it's used to generate the values to be compared instead of the result of a comparison.

Of course, we aren't limited to just keys or values -- any operation we want to perform on the items being sorted will work. To sort by the magnitude of the values of the hash:

> .say for %hash.sort: { .value.abs }
d       0
a       -4
c       5
b       7
e       9

And of course this generalizes to more than just hashes; if @dogs contains a list of Dog objects we want to sort:

@dogs.sort: { .name }      # sort dogs by name
@dogs.sort: { .age }       # sort dogs by age

This works because the ".name" and ".age" methods are invoked on each of the objects in the list to determine the value to use for that object in the sort.

Or for a simplistic case-insensitive sort:

> my @a = <Fruit CHERRY danish Apple berry BaNaNa apricot>;
> .say for @a.sort: { .lc }
Apple
apricot
BaNaNa
berry
CHERRY
danish
Fruit

Besides clarity, another big advantage of

@a.sort: { .lc }

over

@a.sort: { $^a.lc leg $^b.lc }

is that in the first version we compute .lc on each element only once for the entire sort, whereas in the second version it's computed once for each comparison. And since we're typically doing O(n^2) comparisons, the first version can save a lot of method or function calls.

So, that was my fun for the morning. Implementing this behavior turned out to be really easy -- in fact, I wrote the algorithm first as Perl 6 before translating it into PIR:

multi method sort(@values: &by) {
    ...
    if &by.arity < 2 {
        my @v     = @values.map(&by);
        my @slice = (0..^@v).sort: { @v[$^a] cmp @v[$^b] };
        return @values[ @slice ];
    }
    ...
}

The code just uses &by to compute the values (@v) to be used in the sort, does a sort on the indexes based on a comparison of those values, and uses the resulting sorted index list to return the (sorted) slice of the original list. Note that the items in the result list are themselves unchanged from the original list -- they simply have a new sequence according to the &by criteria.

Since then PerlJam++ has suggested that we should do similar things for min and max:

my $longest_string = @strings.max( { .chars } );

This would use .chars as the criteria for determining the longest string but returns the longest string itself.

Perl 6 is very cool.

Pm

P.S.: By the way, this means that the solution to the problem in fw's original post becomes:

my %words;
$*IN.slurp.comb.map: { %words{$_}++ };
.say for reverse %words.sort: { .values };

Not all of the above works in Rakudo yet (I think .comb is not yet implemented), but at least it's getting closer to what we'd really like to see here.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • in the first version we compute .lc on each element only once for the entire sort, whereas in the second version it's computed once for each comparison.

    Nice.

  • Maybe this doesn't make sense but some way to stick with method calls appeals to me.

    Instead of:


    @dogs.sort: { .name }

    Something like:


    @dogs.sort_by.name

    But even as already implemented, I agree it's an improvement and a nice language feature. Thanks, Patrick!

    • Part of the difference here is that .sort_by (or .sort) is a method call on the list as a whole, and any method call after that really ought to be treated as a method call on the resulting sorted list. For example,

      @list.sort.reverse
      @list.sort.map { ... }

      Also, something like @list.sort_by.name gets the invocant in the wrong place, because in order for this to work the invocants to .name should be the individual elements of @list.

      So, I think the information on how to sort (e.g., .name) really deserves to

  • Nice. Ever since I learnt Ruby I'm kind of envious that it has a .sort_by method whereas in Perl we have to do fleshed-out Schwartzian transform. This will make me feel at home again in Perl :-)
  • Turns out that this particular form of sort already exists in the Perl 6 spec -- it was just in an odd place. In Synopis 3, we have:

    The advantage of the method form is that it can be used in places that
    require tighter precedence than C<~~> provides:
     
        sort { $^a.M <=> $^b.M }, @files
     
    though that's a silly example since you could just write:
     
        sort { .M }, @files

    So, it's just Synopsis 29 that needs to be updated, and as of this morning Moritz++ has

  • 1: Why do you test if the arity 2: Do you really want to compare with cmp always? Your .say for %hash.sort: { .value.abs } example will no longer work if %hash has entries that are more then one digit. (Or did cmp change meaning between 5 and 6 to magically DWIM re numeric vs stringy compares -- I haven't been paying all that much attention for a long while now.)
    • Sorry for the rather fragmented first issue. It should say something along the lines of "why do you test if the airity is less then 2, rather then testing for it being exactly 1?"