Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Monday October 02, 2006
09:09 AM

Stupid NULLs

[ #31194 ]

They're frickin' counter-intuitive, I tell ya.

In mysql, I just ran the following query:

mysql> SELECT count(some_field) FROM some_table WHERE some_field IS NULL;
+-------------------+
| count(some_field) |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)

Change count(some_field) to count(*) or count(id) and it returns "105".

I don't know if this particular behavior is mysql specific or not, but it's very annoying.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • The old ANSI_NULLS what should compares do trick. :-)

    In MSSQL, you can flip this sort of thing back and forth with ANSI_NULLS on/off options. Wonder if MySQL has the same sorts of things. It can lead to some unexpected things if you're doing IN subselects during updates/deletes.

  • It kind of makes sense if you think of NULL as "unknown". You can't count it because it's unknown...

    -Dom

    • Agreed here. Pronouncing NULL as "unknown" has kept me out of more than a bit of trouble.
  • I'm not sure if it's a MySQL specific thing or not but it's the very reason that I still use count(*) regardless of the database I'm using.
  • With Oracle at least. It's a misconception that count(foo) is faster than count(*), in case that's the reason you were avoiding it.

    Mind you, I can speak for certain with MySQL, but it may be similar.

    • Actually, Jay Pipes of MySQL gave a talk this weekend about MySQL perf stuff. One of the slides was about not really using count at all if you don't have too, depending on things like MyISAM/InnoDB.

      http://jpipes.com/presentations/mysql_perf_tuning.pdf [jpipes.com]
      pg.16

      Offtopic, but he did a good job of presenting things that I wouldn't normally think about as just a DBIC->mysql user, like how the two engines deal with indexes across multiple keys, etc. Probably stuff any good MySQL dbadmin knows and I never think ab
      • Interesting, thanks for the link.

        That should have been "can't speak for certain", btw. :)

  • I believe that what count(X) really does is count the number of X that are non-NULL. That's why you're getting 0.
    --

    --
    xoa

  • count(expression) counts the number of times expression is not NULL. Which explains your answer. I always use 'count(*)' if I want to count the number of rows - several databases have optimized 'count(*)'. Others use 'count(1)' to count rows.
    • The only two incantations of COUNT() I have used that worked the way I expected were COUNT(*) (to count rows) and COUNT(DISTINCT field) (to count distinct values of a field, and even then I'm not sure what it does with NULL). This is with Oracle; with other RDBMSes YMMV. I'm sure there's some magic I could do with COUNT(field) and COUNT(field1, ..., fieldN), but I don't know how it works. I may have done that in Oracle class or database class at the university, but I didn't retain any information on how

      --
      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
    • Regrettably, this is something that much documentation [mysql.com] does not make clear. I'm expecting count() to return the number of instances of a particular value. Given that NULL can be thought of as "unknown", I suppose one could argue that it makes sense that it doesn't count the number of values. However, it also seems reasonable for one to assume that count(some_field) will return how many unknown values there are.

      Your explanation is perfectly correct. I'm just frustrated that an arguably "intuitive" answe

      • I'm just frustrated that an arguably "intuitive" answer turns out to be very wrong.

        That's because your "intuitive" answer isn't the "intuitive" answer of one someone who breathes SQL. Just like what you find "intuitive" in Perl isn't "intuitive" for many people who also use Perl.

        I do find the answer "intuitive", but only because I keep myself reminding that a NULL in a relational database is a very special thing, and far more undefined than 'undef' is in Perl. If you try to think of a database 'NULL' as

      • Ditto Abigail about intuition. I think the way COUNT works in SQL w.r.t. NULLs is actually useful and helpful.

        Your stance about NULLs reminds me about the saying about GOTO and the apprentice, the journeyman and the master. (Not that I can claim to be a master, mind…)

      • I guess some of the strange logic is this: if you want to get a count of the number of unique values, you use COUNT(DISTINCT field). If you want to get a count of records, you use COUNT(*). Now what do you want when you ask for COUNT(field)? Do you want a count of distinct values? Use COUNT(DISTINCT field). Do you want a count of records? Use COUNT(*). If you grabbed field and then counted the number of values (not distinct), you'd get absolutely the same results as COUNT(*), right?

        So I guess some

        --
        J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
  • Ignoring NULLs in COUNT (just as Abigail explained) is just consistent with other aggregate functions in SQL. For example, SUM(foo) sums up every column 'foo' except the null ones. So there's no magical transformation of null to 0 and no concern with exceptions raised because there are some nulls out there. Obviously this behavior is equivalent to converting NULLs to 0, but it is not so everywhere. MAX(foo) and MIN(foo) also work this way: ignore NULLs - which is good because they don't compare just numbers