mscolly correctly identified the SQL bug I posted. Sadly, no one discussed the implications of this bug and I think they're the most interesting part of this. Essentially, it comes down to the following:
SUM(price) AS total -- what if there's no price?
LEFT JOIN orders ON customer.id = orders.customer_id
LEFT JOIN order_item ON orders.id = order_item.order_id
GROUP BY first_name, last_name, order_date
HAVING total < 15 -- what does NULL < 15 evalute to?
ORDER BY order_date ASC;
The proper solution (as mscolly pointed out) is to change the "SUM" line to this:
COALESCE( SUM(price), 0 ) AS total
The English language, as we know, is ambiguous. If you boss had come in and asked for all customers whose orders (more accurately, whose orders with order items) totaled less than £15 pounds, then the above query would have actually been correct, but another programmer coming along to maintain it could be forgiven for thinking it's in error. If you ever write SQL which is likely to produce NULLs (e.g., outer joins), you should explicitly handle that case if you actually do anything with those NULLs.
But in this case, "customers whose orders total less than £15" is significantly different in meaning than "customers who spent less than £15" and the latter is what we want, but the former is what we have. While the above code seems logical, it gives a logically incorrect answer because it omits customers without orders (or order items), even though they're clearly intended. However, NULLs make it very difficult to identify what you actually mean because the database can't know why something is NULL.
Now consider a simpler, yet silly, example:
WHERE salary > 50000;
What happens if the salary field is NULL? You'll get a list of employees whose known salary is NOT NULL. Why might they not have a salary? Maybe they're an hourly employee and the salary field is not applicable. Maybe they're the CEO and he doesn't think you need to know his salary. Maybe they're an ex-employee and they have no salary.
Taking this a bit further, imagine that all employees in the table are current and all have salaries (no hourly workers), but the salary field is still sometimes NULL because the board of directors doesn't want you to know their salaries. With me so far? In this scenario, it is the case that everyone has a salary; you just don't know what some of them are. So here's the kicker:
WHERE salary = salary;
That won't return anyone on the board of directors, even though you know they have a salary. Furthermore, most would think it's self-evident that p = p, but in three value logic of databases, this is sometimes true and sometimes false. Heck, because of this, the following does not always evaluate correctly, even though we would think it does:
CASE WHEN master_brand_id = master_brand_id THEN '='
WHEN master_brand_id != master_brand_id THEN '!='
END AS 'comparison'
Sure, you say, but you're comparing something to itself. You don't do that in the real world. No? So look at this:
CASE WHEN m.master_brand_id = s.master_brand_id THEN '='
WHEN m.master_brand_id != s.master_brand_id THEN '!='
END AS 'comparison'
FROM service s
If the s.master_brand_id is allowed to be NULL, than the comparison field will always have a NULL value when s.master_brand_id is NULL. It's easy to debug in this simple example, but what if that was a subquery? It looks fine, but it all breaks down in the presence of NULL values.
I didn't start with that example because people would say it's silly, but starting with the "order" example shows how NULLs in databases can return logically incorrect data and the reduction down to the simple p = p case not holding shows why this happens.
At this point, I can see people saying "yeah, but we already know that about databases." And this is true. It's well-known that certain types of queries can generate NULLs even though there are no NULL values in the database. Regrettably, many people assume the database logic is, well, logical. The p = p failure is a strong rebuttal, but I suppose some people assume that hitting themselves in the head with a hammer is normal.
If you really want to have some fun, read this blog entry about NULL values. In the comments, the author even explains how to deal with NULLs in outer joins, but it requires a relational database (very few databases really are) and that people understand what first normal form is really about. (If you think you know, please define "atomic values" in the comments below).
I wonder how database design would look today if, instead of 3VL, databases threw an exception when you tried to apply an operator or aggregation ('=', '+', 'SUM', etc.) to NULL values?
Note: I've discussed the problem with NULL values before, but in realizing I had a better real-world example, I thought it would make more sense to readers.