Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Friday August 14, 2009
10:07 AM

Guess the Solution: Materialized Paths and "ORDER BY"

[ #39460 ]

Imagine that you need to store the following tree structure in your database:

    5---
   / \  \
  62 32  7
/   / \
9  3   29  1
          / \
         48  55

One way this is typically handled is by creating a table linking parents and children:

parent | child
-------+------
5      | 62
5      | 32
5      | 7
62     | 9
... and so on

Because SQL isn't brilliant at recursion (and because most of us aren't as brilliant as Joe Celko), we find ourselves writing several SQL calls to walk up the tree. However, this becomes prohibitively expensive when you have millions of objects in your tree and you need to repeatedly traverse it. Enter materialized paths.

A materialized path is a database denormalization where you flatted each branch of the tree. You join all of the parents, in order, with a separator character which is guaranteed not to be in the id. So the materialized path for 48 might be 5.7.1.48.

node | path
-----+---------
1    | 5.7.1
3    | 5.62.3
5    | 5
7    | 7.32
9    | 5.62.9
29   | 5.7.29
32   | 5.32
48   | 5.7.1.48
55   | 5.7.1.55
62   | 5.62

They can be tricky to maintain, but if you get them right, you can find all parents IDs in a single SQL statement.

Today I ran into a nasty problem where I needed the parents, but I had to guarantee that the order of those parents was preserved at the database level (via ORDER BY). The SQL looks sort of like this:

SELECT *
FROM  tree
WHERE id IN (5,7,1,48)
ORDER BY ...

Um, what goes in that order by? I need the items returned in the same order as the ids in the IN ids. I asked many people and they were stumped. They came up with Oracle specific answers. Some suggested writing the ids to a temp table with sort criteria and joining on that. Others through up their hands. One person suggested a complicated case statement.

Then one of my colleagues, Mark Morgan, said "can't you just ..."

The solution was rather counter-intuitive to me, but once I saw it, I was dumbfounded in its simplicity. Do you see it?

Update: I've posted the solution in the comments, so if you want to think about it a bit first, don't read 'em.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Too lazy to test it by would something like

    ORDER BY FIELD( id, 5,7,1,48)

    do the job?

  • This seems obvious?

        ORDER BY id=5, id=7, id=1, id=48

    --
    rjbs
    • Is that SQL standard? The solution I've gone with is actually much simpler.

      • I have never seen a SQL where that didn't work. I am also interested to see what could be much simpler than that. :)

        --
        rjbs
        • OK, a few people on Twitter have struck out on this and Abigail guessed something similar to what you and Adrian guessed.

          The solution is:

          ORDER BY length(path)

          Turns out it's a deterministic emergent property of materialized paths. There is one caveat I realized, though. If you have multiple trees and you want to see the root nodes, you can do this:

          SELECT id FROM tree WHERE id = path;

          However, some people omit the ID from the path. So you'd have this (in particular, note the path for the 5 node):

          node |

          • Seems problematic:

            1.23.45.67

            1.2.3.4.5

            Perhaps you could count dots.

            --
            rjbs
            • No, you can guarantee that conflicting paths won't exist as you walk up the tree.

            • _"Seems problematic:"_

              Nah - the problem is selecting all the parents of a child in order. The length of the path is guaranteed to increase.

            • The advantage of explicitly counting dots (rather than relying on string length to always increase with the number of dots) is that it makes path-lengths comparable. Not directly applicable to this problem, but not uncommon.
          • Wouldn't it be enough to just ORDER BY path?
  • Depending on the size of the tree, I've found that the nested sets representation is very nice for queries.

    The query that you need is given as an example in this page:

    http://dev.mysql.com/tech-resources/articles/hierarchical-data.html [mysql.com]

    (Although this is a mysql.com site, there is nothing MySQL specific with the solution)

    The problem with nested sets is updates: they might change a large number of records - worst case, inserting a new node at the left-most place will update all the rows.

    But smaller trees, they

    --
    life is short