Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of cosimo (4138)

Wednesday May 07, 2008
03:30 AM

I'm not sure I understand SQL...

[ #36348 ]

Can any good souls out there explain to me why this SQL query actually works? And no, it doesn't matter what it does... I already know that :)

    SELECT concept, count(*) AS cnt, c.rtype
      FROM concepts JOIN conceptbindings AS c
        ON (c.conceptid = concepts.id)
  GROUP BY concept
  ORDER BY cnt DESC
     LIMIT 0,100;

If there's some explanation which I missed entirely (and that's possible), good.
If there's not, can you guess the DBMS ? :)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • The weird part is that the join is a 1-m, right? That is, conceptbindings.conceptid is not UNIQUE. Thus, c.rtype should not be admissible in the SELECT clause. A proper DBMS would complain about the fact that you need an aggregate function (like COUNT, or MIN).

    MySQL does not complain, right?

    • Exactly. c.rtype shouldn't be there, or an aggregate function should be used. Or, you could GROUP BY c.rtype too.

      But as it is, it shouldn't work. At least within my current understanding of SQL...

      And yes, it's MySQL 5.

      • I think MySQL takes the first value it sees for that file, within the group

        The reasonableness of this is debatable, but at least make getting some result easier...

    • Really, c.rtype should not be admissible there even if conceptbindings.conceptid was UNIQUE (that is, it was a 1-1 relation), since there would be anyway no guarantee that c.rtype had a unique value for each concept group (value).
      Simply, the SQL standards for GROUP BY require that the columns in the select list must be in the GROUP BY expression or they must be arguments of aggregate functions (PostgreSQL would indeed complain, and so does even MS Access :-)

      Cosimo, you're right about SQL, this behaviou
      • Simply, the SQL standards for GROUP BY require that the columns in the select list must be in the GROUP BY expression or they must be arguments of aggregate functions

        That's not entirely true. The standards after 1992 state that the column must either be in the GROUP BY, part of an aggregate function, OR a "functional dependent" of something in the GROUP BY.

        I'm not sure if we can demonstrate a presence or lack of functional dependence based solely on the original SQL posted, but the spec is a little m

        • That's not entirely true. The standards after 1992 state that the column must either be in the GROUP BY, part of an aggregate function, OR a "functional dependent" of something in the GROUP BY.

          As far as I can see this was introduced in SQL:1999, while it's quite common to quote as SQL standards just SQL-92 (or even just SQL-89), since some think that later revisions are not to be considered real standards anymore:
          http://www.tdan.com/view-articles/4923/ [tdan.com]
          http://articles.techrepublic.com.com/5100-10878_11-1046268.html [com.com]

          Anyway you're formally correct ;-)

          I'm not sure if we can demonstrate a presence or lack of functional dependence based solely on the original SQL posted, but the spec is a little more nuanced than your quote suggests.

          Good point.

          I implied that if we had had a way to be sure that c.rtype had a unique value for each concept value (which is what function

  • It surprised me when I saw that Sybase accepted a similar syntax. But the result seemed next to useless. Non-aggregate columns not in the "group by" clause result in a cartesian join, so, every row in the conceptbindings table (via the rtype column) will join with every combination of conceptid and count. I'm not sure if MySQL behaves the same or differently.
    • MySQL returns "some value", maybe the first, for each row of grouped concept. But yes, I'd rather prefer a cartesian product than this unexpected behaviour, even if I can understand why it's like that...
      • Oh no you don't. Not after you've screwed up and accidentally done a cartesian product between 100,000 records and 100,000 records again.

        I loathed that.

        If they want to save me time, just make the GROUP BY clause optional. Because 99% of the time my GROUP BY clause is just going to be all of the non-aggregate functions in my SELECT clause. And I hate having to type in all of that duplicate information.