Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

gnat (29)

  (email not shown publicly)

Journal of gnat (29)

Monday March 31, 2003
07:34 AM

Databases: Paging and Hierarchies

[ #11333 ]
I just put the database chapter to bed, and found two interesting things while working on it. First was that there doesn't seem to be a generic module for paging through results. I ended up writing code to do this for the book (you know, displaying a subset of a table with "Viewing Results 26-50" and the appropriate back/forward buttons) and while it was web-specific, I kept thinking "it wouldn't be too hard to generalize this". Before I do so, has anyone else invented this particular wheel first?

And I learned a cute trick for searching hierarchies. You know how to store a tree in a table, right: you give each node an id and store the parent id of the node as well:


So now you can find the children of node 5 easily: SELECT * FROM node WHERE parent=5. But it's hard to select all the children of node 5. That requires a tree traversal, which involves lots of database queries, which gets jugly fast.

The cute trick is to build another table containing the path of each node ("." means that this is node 19 whose parent is node 12 whose parent is node 5 whose parent is node 1). Then finding node 5 and its children is as simple as:

SELECT id,path FROM paths WHERE path LIKE "%.5.%"

Suuuper sneaky! I'm really beginning to appreciate how different it is to program in SQL...


The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I think HTML::Pager [] fits the bill (atleast somewhat). Can't say I've ever used it, though. Looks like it depends on HTML::Template for its output (although it claims it can still be used without it). HTH.

  • Data::Page (Score:2, Informative)

    You probably want to investigate Data::Page [] and it's cousin Data::PageSet [].

    I wrote a paging module before. I wouldn't wish the edge cases on anybody.


    • Oooh, I forgot to mention Class::DBI::Pager [], which is wizzy if you're using Class::DBI.


    • Well bollocks. I could have sworn I googled without success for "page database results in perl". So much for being done with Chapter 14! Thanks,


      • This is ancient history by now but Tim Bunce supposedly did a review of this, at OSCON in 1999 or 2000? IMO paging through RDBMS results is always going to be ugly since SQL is a set-oriented language. You *have* to break the model to do that.
  • The cute trick is to build another table containing the path of each node ("." means that this is node 19 whose parent is node 12 whose parent is node 5 whose parent is node 1).

    Beware that this is a premature optimization.

    While this technique works fine with a static tree, it makes tree transformations hideously difficult. Imagine the pain that comes from moving node 12 to be a sibling of node 5, or removing node 5 and consequently promoting their children up a level.

    The other technique

    • Heh, if you wanted to be really ugly, you could keep the parent id in the sme table, but have a trigger to rebuild the secondary table with the complete path-like listings. It'd be heavy on the database, but might be affordable depending upon how many updates you're doing.


    • It seems a bit premature to call this premature :-) As with every problem, the right data structure depends on your data and how it's accessed. Just as an alphabetized flat file is quick to binary search but slow to insert, whereas an unordered flat file is quick to insert (append) but slow to search, you choose your solution based on what you know about the data. If you were going to be doing a lot of hoists or reparenting operations, then I guess you'd have to test it in the field. This doesn't change
      • How do you recursively find all the children of node 5 without doing a ton of SELECTs? That's the problem that a path table gets around.

        I personally don't think that the "ton of SELECTs" is all that horrible. That's probably my personal style though.

        It starts out with something like this:

        SELECT id FROM nodes WHERE parent IN (5)

        ...which returns a list of IDs. That list of IDs then goes into a new SQL statement. The IDs are also pushed into a list that contains all IDs returned from your per-leve

        • I've written code as the one you describe dozens of times and I must say that it certainly was "fast enough". I'd be worried if I had very deep trees though, as that would be when it could become costly.

          But then why store a tree in an RDB? Wouldn't XPath be a *lot* more pleasant to access random things in a tree? ;)


          -- Robin Berjon []

  • If SQL didn't suck so much, it'd offer built-in methods to explode trees and return the result. But it does suck.

    Fabian Pascal talks about hierarchies a fair amount in his book Practical Issues in Database Management. Interesting stuff.
  • My current favorite DBI abstraction is Class::DBI. In any case, paging has become just easier with Class::DBI::Pager.
    Casey West
  • Nat,

    I've been doing a bunch of hierarchy work recently in SQL, and the hands-down best way to do the type of typical hierarchy queries that I do in SQL is to store (or at least additionally represent) hiearchies as sets.

    See, SQL is a set-operation language. So having hierarchies stored as an adjacency list (e.g. employee table has a 'manager' column) -- although sufficient -- forces you out of SQL and into a procedural language for tree-walking.

    Imagine you want to get "this person's management chain" or
    • Crap, my post got its < and > mangled, even though I told the submission widgit it was plain text.

      Here's the decoder guide -- not actual SQL.

      My Management Chain: Select where left LESS THAN my_left AND right GREATER THAN my_right

      My Vast Empire Select where left GREATER THAN OR EQUAL TO my_left AND right LESS THAN OR EQUAL TO my_right Joel Noble

  • If you have a static tree (i.e., no inserts or infrequent, scheduled inserts), there's a trick from one of Joe Celko's SQL books that might apply. (From memory...) Walk the tree using a linear sequence generator that starts at 1. As each node is reached pre-order, store (insert/update) the next id that well be generated. The next id represents the "left" side of the subtree rooted under each non-terminal node. As the node is reached post-order, assign (update) its id from the sequence generator. When you do
  • I just read an article today on something similar to this. Read the section on 'Materialized Path' at this site:
  • Nobody has mentioned Oracle's CONNECT BY []. Hierarchical queries in one statement!

    I doubt Ziggy is aware of this, but the tree hack you describe is roughly how ASPN [] works. Ugly and inflexible, yes, but it gave us the performance we wanted, without having to shell out for Oracle.

    It may have been a premature optimization. I personally was horrified by the idea during the design meetings. But it works very well in practice and it's easy to understand.