Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

schwern (1528)

  (email not shown publicly)
AOL IM: MichaelSchwern (Add Buddy, Send Message)

Schwern can destroy CPAN at his whim.

Journal of schwern (1528)

Friday February 03, 2006
09:47 PM

sort in scalar context

[ #28577 ]

I'm going to crack this endless argument open once again because I was bitten by it yet again yesterday and I see myself getting bitten again and again in the future. I wrote "return sort ..." and I've been programming Perl for over 10 years. I grepped our code base at work and found 34 instances of 'return sort', each of them subtle bug waiting to strike. Try grepping your own code for it. The Perl core has 24 instances (9 are in the debugger alone).

For those who don't know, the behavior of sort in scalar context is undefined. I believe its because folks have never agreed on what it should do. The arguments fall into three camps:

1) sort should return the number of items in the list.

For: Then it will work just like map and grep which sort looks and behaves very simlar to.
Against: If all you want to know is the length of the list, why sort at all?
For: Because this is very common: return sort @foo;
Against: You can't assume that because a function returns a list in list context
                  that it will return the length of the list in scalar context.
For: Its a pretty safe assumption. return map, grep or @array all work that way. Its extremely common in real world code and a solid default behavior.

Against: There's lots of things which return lists in list context but not their length in scalar context.
For: The argument that "X should be inconsistent because Y and Z are inconsistent" is dubious at best, particularly in Perl where there's plenty of inconsistencies to dredge up. But let's look at those inconsistencies, are they really all that inconsistent? Those which don't return their length can be split into several categories: Things which always return a fixed size list (each, getpw*, gm/localtime, select, stat, times, caller); iterators (glob, readline, readdir); list behavior (as opposed to array behavior); other (unpack, qx). For things which return fixed sized lists, its not useful for it to return the length and unlikely to surprise anyone that they do something different. Iterators are a well known pattern. The behavior of lists has always been odd and full of traps for the unwary, so using it as an argument is dubious. As for the handful of outriders, they are just that. Of the things which do return their length in scalar context there is a clear pattern that "foo BLOCK/REGEX, LIST" (map, grep, split) returns length. sort is the exception.

For: What if I'm the subroutine author and want my function to return a sorted list in list context but a length in scalar? I have to write either "my @sorted = sort @foo; return @foo;" or use the archane "force list context" trick of "() = sort @foo;" Neither is natural.

For: People write "return sort" all the time without thinking. Its in the Perl core. Its all over CPAN. Close that trap!

For: Its a trivial patch.

2) sort should do insert escoteric functionality here.

For: sort can return a ratio of already sorted adjacent pairs! (or whatever)
Against: That sounds about as useful as keys() returning the bucket ratio.

3) Leave it undefined.

For: We leave it undefined in case a clear solution presents itself in the future.
Against: IT IS THE FUTURE! How much longer are you going to wait?

Against: People write "return sort" all the time. By leaving this undefined a large trap has been created.

For: Unlike map or grep, "scalar sort @foo" will return the exact same value as "scalar @foo". So why use sort in scalar context at all?
Against: "return sort @foo".

That's how I see it. Feel free to add your own arguments.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Sort in scalar context should return bunnies. Then everybody would be happy. Bunnies are fluffy and cute. If you don't think that, you'll find that they're awfully tasty. If you're a vegetarian, they conveniently return a ratio of already sorted pairs. You can't lose!

  • In the meanwhile, it sounds like a good thing to add to Perl::Critic []...


    • I'd second that, as it is what I was going to suggest. I see return sort .. in a same way as return undef just because one doesn't want something to be returned. It's ambiguous and in my opinion the developer should be aware of that. Would be a lot easier with a Perl-Critic policy, though I don't know how spread that already is "in the wild."
      Ordinary morality is for ordinary people. -- Aleister Crowley
  • It’s interesting that neither argument against returning the list length presents a use case that would be precluded; both just argue on principles. And the leave-it-undefined stance takes the same basic position.

    The magnitude of sorted pair ratios != 1.0 is meaningless, so that’s not even as useful as keys in scalar context.

    We have exactly one real-world use case, return sort @foo, so there is exactly one meritorious argument, and it has no compelling counter. Why is anyone even arguing abo

  • I've always thought the obvious behavior would be to return the first element. It doesn't gain much (saves you typing two parens, maybe), but it's just what I always thought made sense...
  • First of all, I use 'return sort ...' in my code on a regular bases. I don't consider this a bug. If I write a function that's supposed to return a (sorted) list, and someone calls the function in scalar context, then it's the calling code that's buggy, not the subroutine. If I wanted the function to behave sensible when used in scalar context, I'd make it so. But I doubt that it would then return the number of elements of the list. I'm with rjbs on this one - the first (or last) element would be more usefu

    • If I wanted the function to behave sensible when used in scalar context, I'd make it so.

      The other way to look at that is "I want my functions to behave insensibly in scalar context unless I remember to make it so" which strikes me as a little peevish and definately not DWIM.

      That aside, here's something more concrete--how do you do that? Take this sample for example, which is much like the code I recently stumbled over...

              return sort { $a cmp $b }

      • The other way to look at that is "I want my functions to behave insensibly in scalar context unless I remember to make it so" which strikes me as a little peevish and definately not DWIM.

        That sounds like for all functions that I design to return a list, I should take the trouble of figuring out what it they should do in list context.

        Sorry, but I pass. For functions where it makes sense, I do so. But for many, I won't. Consider the context to be a pre-condition - just like the arguments being passed in.

    • That would require of Perl to do extra magic when it encounters return sort, and it’s non-reentrant. I vote no. If you want an iterator, then an iterator object/closure should be returned. Let’s not create more instances of the each trap.

  • My vote for the meaning of sort in scalar context is to return whether the list was already sorted.

    unless( sort @list ) {
        # @list is not sorted, deal with it...

    If you want to go beyond a simple boolean, the index of the first out of order element sounds a lot more useful than the proprtion of sorted neighbours - and, like the boolean result, it can short circuit the scan of the list as soon as an out of order pair is found.

    • to return whether the list was already sorted

      An is_sorted @list predicate would be an interesting builtin/XS/Inline function, and should run faster than sort (if coded equally well). It should run about the same speed as a least @list sort-variant optimized to return (sort @_)[0].

      However, I'm not sure is_sorted should be called scalar sort, that doesn't seem any better than calling least as scalar sort. Changing sort from verb-transitive to verb-predicate when going from list to scalar context is rather

      # I had a sig when sigs were cool
      use Sig;
  • All of the return sort's in the debugger are called in list context - Term::Readline's completion function in most cases, and a similar bit of code in the options handlling. It's probably not worth patching it.
    • Three times a year, a new person is caught by surprise by sort in scalar context. The rest of the time, noone cares or notices. Why not spare these three guys the surprise and just give it a reasonable default? Noone will notice if it’s patched.