Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Monday December 07, 2009
02:43 AM

Find the bug (sql)

[ #39985 ]

Update: You can ignore the order_date below. It's a red herring and I probably should have left it out, but I had liked the fact that by putting it in the query, I added more complexity, thus making the real bug more difficult to spot.

Assume you're a diligent programmer. You've designed your database carefully. Foreign constraints are correct, you have no null columns and you've kept a nice, simple design. Now your boss wants you to provide a list of all customers who've spent less than £15 on your Web site because you want to offer them a special promotion. Here's the SQL you've written:

SELECT    first_name,
          SUM(price) AS total
FROM      customer
LEFT JOIN orders     ON = orders.customer_id
LEFT JOIN order_item ON   = order_item.order_id
GROUP BY  first_name, last_name, order_date
HAVING    total < 15
ORDER     BY order_date ASC;

The two left joins are there because a customer may never have placed an order before. Heck, they may have started an order but not added any items to it. The 'having' statement is required because you generally can't use aggregates in where clauses.

You run the SQL and hand-check the results very carefully. Choosing a random sampling of customers returned, you verify that none of them have spent more than £15 on your site. Nonetheless, you have a bug. What is it? What are the implications of the bug?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I think your boss wants you to give a special to your customers who have given the business less than 15 pound ever. Yet you're grouping by order_date. Thus if your customer has purchased something for 10 pound last week and 10 pounds yesterday I would expect them to show up twice in the output:

    John, Smith, 2009-11-30, 10
    John, Smith, 2009-12-06, 10

    • I think if you want some information about customers it is probably a bad idea not grouping by customer id. Depending on your site you may or may not have 2 customers with the same name but you will almost always have the occasional repeat customer.

      Actually , looking at the code I would assume I misunderstood the requirements as I don't see why order_date is there at all.

      • I should have removed order_date. It was present in the first version of the SQL I wrote.

  • "price" I presume is from the order_items table. If there are no orders and hence no order_items, the "price" field is a null (from the outer joins). sum(null) is null, and null is not less than 15, the comparison is null. This means you will not see the customers with no (completed) orders.
    • Even worse, if a client has completed orders, and at least one incomplete order, this gives a null price, and the sum will be null again.
      • What do you mean by "incomplete order"? I just added an order for a customer who already had orders, but did not add order items for it. The query results were the same, so I think I may have misunderstood you.

        • An "orders" record without corresponding "order_items" record(s) would be an "incomplete" order.

          Regarding your example, the result depends on whether there were other "complete" orders for the customer or not, and how your database treats nulls. Since a null means "totally unknown", anything plus null results in null, and comparing null with 15 is also null (neither true nor false). You should be using sum(coalesce(price,0)) for your definition of the field total.

  • SELECT first_name,
                        SUM(COALESCE(price, 0)) AS total
    FROM customer
    LEFT JOIN orders ON = orders.customer_id
    LEFT JOIN order_item ON = order_item.order_id
    GROUP BY first_name, last_name, order_date
    HAVING total 15
    ORDER BY order_date ASC


  • I believe you should be grouping by customer_id rather than first/last name... otherwise you could be combining results for two John Smiths that order on the same date.