Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.
Imagine that you need to store the following tree structure in your database:
5---
/ \ \
62 32 7/ / / \
9 3 29 1
/ \
48 55
One way this is typically handled is by creating a table linking parents and children:
parent | child
-------+------
5 | 62
5 | 32
5 | 7
62 | 9
... and so on
Because SQL isn't brilliant at recursion (and because most of us aren't as brilliant as Joe Celko), we find ourselves writing several SQL calls to walk up the tree. However, this becomes prohibitively expensive when you have millions of objects in your tree and you need to repeatedly traverse it. Enter materialized paths.
A materialized path is a database denormalization where you flatted each branch of the tree. You join all of the parents, in order, with a separator character which is guaranteed not to be in the id. So the materialized path for 48 might be 5.7.1.48.
node | path
-----+---------
1 | 5.7.1
3 | 5.62.3
5 | 5
7 | 7.32
9 | 5.62.9
29 | 5.7.29
32 | 5.32
48 | 5.7.1.48
55 | 5.7.1.55
62 | 5.62
They can be tricky to maintain, but if you get them right, you can find all parents IDs in a single SQL statement.
Today I ran into a nasty problem where I needed the parents, but I had to guarantee that the order of those parents was preserved at the database level (via ORDER BY). The SQL looks sort of like this:
SELECT *
FROM tree
WHERE id IN (5,7,1,48)
ORDER BY...
Um, what goes in that order by? I need the items returned in the same order as the ids in the IN ids. I asked many people and they were stumped. They came up with Oracle specific answers. Some suggested writing the ids to a temp table with sort criteria and joining on that. Others through up their hands. One person suggested a complicated case statement.
Then one of my colleagues, Mark Morgan, said "can't you just
The solution was rather counter-intuitive to me, but once I saw it, I was dumbfounded in its simplicity. Do you see it?
Update: I've posted the solution in the comments, so if you want to think about it a bit first, don't read 'em.
FIELD? (Score:1)
Too lazy to test it by would something like
ORDER BY FIELD( id, 5,7,1,48)
do the job?
Re: (Score:2)
I think that's MySQL specific. The very simple solution is much easier.
Re: (Score:1)
Ah...
ORDER BY CHAR_LENGTH( path )
?
Re: (Score:2)
That's pretty much it. I responded to Ricardo with the solution and a caveat [perl.org].
too easy? (Score:1)
This seems obvious?
ORDER BY id=5, id=7, id=1, id=48
rjbs
Re: (Score:2)
Is that SQL standard? The solution I've gone with is actually much simpler.
Re: (Score:1)
I have never seen a SQL where that didn't work. I am also interested to see what could be much simpler than that. :)
rjbs
Re: (Score:2)
OK, a few people on Twitter have struck out on this and Abigail guessed something similar to what you and Adrian guessed.
The solution is:
Turns out it's a deterministic emergent property of materialized paths. There is one caveat I realized, though. If you have multiple trees and you want to see the root nodes, you can do this:
However, some people omit the ID from the path. So you'd have this (in particular, note the path for the 5 node):
Re: (Score:1)
Seems problematic:
1.23.45.67
1.2.3.4.5
Perhaps you could count dots.
rjbs
Re: (Score:2)
No, you can guarantee that conflicting paths won't exist as you walk up the tree.
Re: (Score:1)
_"Seems problematic:"_
Nah - the problem is selecting all the parents of a child in order. The length of the path is guaranteed to increase.
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
With the caveat above, yes, I think it might be.
Have you looked at nested sets? (Score:1)
Depending on the size of the tree, I've found that the nested sets representation is very nice for queries.
The query that you need is given as an example in this page:
http://dev.mysql.com/tech-resources/articles/hierarchical-data.html [mysql.com]
(Although this is a mysql.com site, there is nothing MySQL specific with the solution)
The problem with nested sets is updates: they might change a large number of records - worst case, inserting a new node at the left-most place will update all the rows.
But smaller trees, they
life is short