Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Wednesday June 16, 2004
03:36 PM

Clarification to Database Programming Rule #1

[ #19309 ]

There's been enough disagreement (and no agreement, to date) with Database Rule #1 that I felt some clarification is in order (pun intended.)

The rule was: In production code, if your SELECT statement returns more than one record and it doesn't have an ORDER BY clause, it's probably a bug.

First I'll deal with a side issue that was brought up: despite the objection, performance is not a consideration. Unless there are systemic reasons to consider performance (which is why Perl's not a great choice for AI -- darn it), performance should not be considered. Correctness of code is far more important. However, it was pointed out at the same time that Perl can do the sorting. This is frequently wrong. Let the database do the things the database is naturally good at. Unless Perl's sorting offers something the database can't (as phillup implied) it's probably not going to be as efficient as the database sorting. It's optimized for that sort of thing.

As for "but I'm just dumping data into Excel" argument, I don't buy it. Admittedly, my experience may not be typical, but usually if I'm dumping some data into a spreadsheet and I'm not worried about sorting, it's a one-off task. However, if it's production code (as specified in the rule), then ORDER BY tends to become more important because it's easier to test ordered data ... and my pedantic little self says that if you check in code without checking in tests than you're being a bad programmer (as I have, on more than one occassion.) Of course, one could argue for a sort in the test and sometimes this is OK, but most of the time I would think that it's not. Consider:

my @actual = sort {
        $a->[1] cmp $b->[1] # sort by customer name
                ||
        $b->[3] <=> $a->[3] # and then order total
    } fetch_foo_from_db();
is_deeply(\@actual, \@expected, '... and the datasets should match');

If we fail to sort the data, we cannot guarantee a deterministic query and this can lead to intermittant test failures, so we're sorting the data here. If we consider the performance argument mentioned earlier, this is clearly not an optimal solution because if performance does become a consideration, we are probably returning a lot of data and Perl is unlikely to sort it as efficiently as the database will (and sorting dates can really get ugly.)

So when I say that failure to order your select results is probably a bug, I really mean to say "apply logic as early as possible." Admittedly, this means that my database rule becomes a specific example (and possibly a bad one) of this issue. A beautiful example of this problem is how so many applications mishandle error reporting.

$self->report_error($error) if $error;

sub report_error {
  my ($self, $error) = @_;
  my $fh = $self->_error_fh;
  print $fh $self->_time .":  ($error)\n";
  return $self;
}

That's bad design and begging for bugs. Whenever someone calls the function and forgets to test for the existence of an error, empty messages are written to the error log. It's better written as:

$self->report_if_error($error);

sub report_if_error {
  my ($self, $error) = @_;
  return $self unless $error;
  ...
}

Now you never have to worry about the test again. However, sometimes you need the method call wrapped in the conditional because the method may not always be able to determine when it should be called. When this happens, the logic is justifiably pushed higher in the code. In the case of sorting return results from the database, the earlier it can be done, it should be done. Doing it later means that you're more likely to forget it or do it incorrectly. If it can be done in the database, it should. If it can't, but the order must not vary, then wrap the call in a function or method which minimizes the chance of the sort order being skipped.

In short: I grudgingly admit that my "database programming rule #1" was too simplistic :)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • This might inspire similar levels of disagreement.

    If a query is expected to return one row, then it had better be specify the key in the WHERE clause. Never depend on some other considerations to result in only one row being returned. Uniquely constrained fields might be used instead.

    Corollary: any time you have a query which returns only one record, encapsulate it in a function. That might be a PL/SQL function or a Perl function or something else.

    --
    J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
  • Unless Perl's sorting offers something the database can't (as phillup implied) it's probably not going to be as efficient as the database sorting. It's optimized for that sort of thing.

    With Oracle (and others?), a sort operation is guaranteed to increase CPU usage and can significantly affect query performance. In addition, there can be extra I/O overhead, which can cause you to slam into your I/O limits, especially if a disk sort is required. Plus, all rows must be accessed and sorted before the first

    • You've mentioned great examples of where optimization should still be done after the fact. If I don't know that a query (and I frequently create huge queries) is going to cause a problem with performance, I am not going to worry about it up front. The simplest thing to do is simply to have the SQL specify the order. That's what SQL is there for, after all. If some DBA tells me "I don't want you using SQL the way it's intended to be used," I would suggest that the DBA is sorely mistaken -- if that's a bl

      • Also note that ORDER BY clauses are not necessary if an index exists on some or all of the columns in the clause.

        To be honest, I was not aware of that. Can you provide some documentation?

        It's common sense, if an index is being used and there's no ORDER BY clause, then rows will be returned in the order of the index. But I wouldn't generally rely on it in production application code and I would explicitly specify the ORDER BY if that's the order I wanted.

        I have run across production code that had somet

  • Afraid it still doesn't convince me. For testing I would tend to either mock the DB input, or use bag/set tests.

    The efficiency argument doesn't persuade me.

    If you need ordering for the production code then it's needed by the problem domain and it obviously makes sense.

    If ordering is not needed by the production code and just makes the test more convenient to write then it smells to me. You're unnecessarily exposing implementation details to the test suite.

    You're also swapping slower tests for slower pro
    • I think I should have been clearer about the "apply logic as early as possible" rule. I'm not advovating applying unecessary logic. I'm advocating applying the logic as early as is reasonable. If you need your database records sorted, sort them in the database, if possible. This discussion, to be quite frank, came from me working on some code which returns records in database order and is sorted after it's fetched, but this led to places in the code where it should have been sorted, but wasn't. I was v