Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Saturday December 10, 2005
03:43 PM

NULLs: yes, no, or don't know?

[ #27927 ]

So I finally got my grubby little hands on a copy of Database In Depth by C. J. Date. I've just finished Chapter 3 and so far, I've been blown away. My irritations with the shortcomings of databases have been wrong. They're generally shortcomings with SQL and people's (mis)understanding of relational theory.

One bit which fascinates me is Date's arguments against NULLs. It fascinates me because it really hits home regarding some problems I've had but it also irritates me because I've read these arguments before and not paid much attention to them. It's time for me to change that. Now, if I reject his reasoning, I want it to be because I've thought things through, not because I wasn't paying attention.

Part Date's rejection of NULLs stems from how terms are defined: an attribute (column) in a tuple (row) in a relation (table) must contain a name and a type. Since NULLs, by definition, are a not a type, it follows that an attribute must not have a NULL value.

Of course, without an explanation of the theory, many are going to argue that Date is just using wordplay to define NULLs out of existence. To deal with this, Date comes up with the following example:

+-----+--------+  +-----+--------+
| SNO |  City  |  | PNO |  City  |
+-----+--------+  +-----+--------+
| S1  | London |  | P1  | NULL   |
+-----+--------+  +-----+--------+

"SNO" is "Supplier Number" and "PNO" is "Part Number". If these tables are called "supplier" and "part", then fetching all part numbers is trivial:

SELECT PNO
FROM   part;

This should be logically equivalent to the following:

SELECT PNO
FROM   part
WHERE  City = City;

Of course, this query returns no rows because you cannot logically compare "unknown" values. This is fundamentally the simplest example one can give which demonstrates how NULL values can give erroneous results. However, the query is so ludicrous that many would be inclined to dismiss it out of hand. After all, we'd never write anything so useless, right?

Moving along, we need to write a query where we need to know where either the supplier and part cities are different, the part city is not Paris, or both. Here's one way to write that query:

SELECT S.SNO, P.PNO
FROM   supplier S, part P
WHERE  S.City <> P.City      -- first condition
   OR  P.City <> 'Paris';    -- second condition

Because we cannot compare NULL values and the only "part" has an unknown city, we get no rows. But is this correct? Either that unknown city is Paris or it isn't. If it is Paris, the first condition should evaluate as true and thus we'll get a result. If it is not Paris, the second condition should evaluate as true and we should get a result. Thus, logically we should get a result even though the query returns no rows.

The mere existence of a NULL in the database means we may get erroneous results in our queries. However, the more complicated the queries, the more difficult it will be to tell when we'll get erroneous results and, I suspect, the more likely it is when we will get erroneous results if the NULL columns are used in the WHERE clause.

These points really resonate with me. While I've certainly found three-value logic attractive (I don't like perl -le 'print "yes" if (undef) < 4' printing "yes"), it does appear to be problematic in the above example. I'm beginning to think that three-value logic may have a place in procedural or OO code, but perhaps not in declarative code. More than once I've worked on a large systems where NULLs were causing problems in queries that were often very difficult to track down.

What I'm wondering is what the counter-examples are. Such a fundamental issue shouldn't be controversial unless there are clear counter-examples which demonstrate why we do need NULLs. The "that's the way we've always done it" argument doesn't hold much sway with me because I view that akin to "hitting my thumb with a hammer is better than hitting it with an axe" -- just don't hit your damned thumb, will ya?

Note: avoiding the "NULL" problem can be dealt with via the technique presented in How to Handle Missing Information without Using Nulls (pdf) by Hugh Darwen.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • My brain is hurting!

    Okay, not really. But this has been interesting reading.

    • What's really frustrating is that much of the problems associated with nulls would go away if database vendors would start supplying DBMS which enforce relational data (SQL doesn't). First, we would need something called "distributed foreign keys" -- mentioned in the PDF I link to. Next, databases would have to be optimized to handle 6NF, something which is rarely seen, and 1NF, something which is not well-understood. Further, queries would need to return relations. In other words, they would need to re

      • I can't believe I'm only now seeing this post...

        Since reading Database In Depth, I've also realized that the object-relational impedance mismatch is a myth. It is a persistent myth, but a myth none the less.

        Here's the quote from that book that I always go back to:

        All of which goes a long way, incidentally, toward explaining why a true "object/relational" system would be nothing more nor less than a true relational system—which is to say, a system that supports the relational model, with all that

  • Ovid,

    First of all, I don't understand how your example, perl -le 'print "yes" if (undef) < 4, is an example of three-value logic. The parentheses create a list, and a list in scalar context evaluates to the number of items it has, not the value of the first item.

    This works well because, unless I'm missing something, there is no three-value logic in Perl. Perl's undef is not the same as SQL's NULL; undef always evaluates to false! IOW, it is a value, just an undefined one.

    The easiest way f

    • A wise man (was it Perlis or Bentley or was he quoting?) once said the only sensible values for constants are 0 and 1, all else are parameters that may indeed change with requirements. (Two:=1+1 was allowed as a special case for buffer-swapping, one in use and one in prep. Perlis said something to the effect of any supposed constant is someone else's variable.)

      So Three-valued is probably a bad basis for a logic. Date's point about Nulls are there are many more one out-of-band value possible, and you rare

      --
      Bill
      # I had a sig when sigs were cool
      use Sig;