Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • A wise man said any SQL query without an ORDER BY clause is a stealth bug waiting to emerge when the implicit order dependency explodes into view.

    In the customer-visit itinerary, ORDER BY may actually be a better solution than DISTINCT -- the repetition value in the BAG will be needed to help plan how long to lay over in each city, as it measures how many visits are required there. GROUP BY (City) COUNT(Customer) might be better, and would get you the equivalent of DISTINCT.

    # I had a sig when sigs were cool
    use Sig;
    • You've come up with a rather artificial reason why one might, in this case, want a bag instead of a set, but your example is arbitrary. Who knows why Alice is in a particular city? Maybe she has some weird contractual agreement that she needs to at least be in those cities once a year? The point is the same: if I ask for information, why should I want my query, by default, to repeat itself rather than just give me the information I ask for?

      Don't focus on the specifics of this particular problem becaus

      • I think we're in violent agreement that naive SQL is bad, and we're just arguing over which form of naivety is worst!

        This is not just a artificial quibble on the artificial example. I gave a different, more general rule that explains why the artificial example is wrong at an even deeper level. I claim that in general both general rules need to be addressed together.

        I agree that SQL having been designed abstractly by mathematicians (with whom I happily self-identify) so that it encourages thinking SETs yet was implemented so it often returns BAGs is a problem of abstraction. Having a SHOW DUPS keyword instead of a DISTINCT keyword, reversing the defaults, would be more practical yet theoretically sound. Furthermore, both SET and BAG are the wrong abstractions: a ORDERED SET is the usually right return type for all queries. And when a BAG is what is desired, it should be a Multiplicity-encoded Ordered Multiset, not an unordered Bag with repetition. (Obviously(?), a totally order is to be preferred over a partial order -- in SQL terms, that means using a long enough composite key.)

        To repeat, both SET and BAG are unordered, pure mathematical concepts (dear to my heart), but in practical programming, we need order. Any query returning a true (unordered) SET with DISTINCT but no ORDER BY is just as broken if not more broken as one returning a true (unordered) BAG (without DISTINCT) which you've objected to. Just as BAGginess is fixed with DISTINCT (or COUNT .. GROUP BY ..), DisORDER is cured by ORDER (or GROUP BY).

        If in the artificial example the query weilder had used ORDER BY [as all practical SQL queries should be written whether desiring BAG or SET, to avoid getting bitten by hidden order dependencies], the artificially weird result would not have looked so weird, and the multiplicity of cities in the artificial example would have had emergent semantics, and the accountant wouldn't need firing (emotional statement in original artificial example). And the desirability of applying COUNT ... GROUP BY would have suggested itself.

        There is more than one way to convert a BAG to a SET with COUNT(... )FROM with ORDER BY ... GROUP BY ... , which is the MultiSet impelmentation of BAG, where the multiplicity is an attribute of the element rather than manifested by repetition, is a semantics preserving subsitute for DISTINCT.

        I agree with your basic thesis that all results should be converted to SET form. However, I would prefer COUNT ... GROUP rather than DISTINCT and further require the result be ORDERED.

        # I had a sig when sigs were cool
        use Sig;