I recently ran into a problem quite familiar to developers: how to efficiently store and query a complex hierarchy several levels deep? Unfortunately, this was for a well-established project built on MySQL, so the recursion and Common Table Expressions of SQL Server or PostgreSQL weren’t options.
I’ve worked with simple trees where a lowly Adjacency List pattern was sufficient, but this one could potentially grow very large over time and thus needed something more robust. Nested Sets seems to be all the rage, but it’s so terribly inefficient at writing. I also looked at its more complex but speedy cousin, Nested Intervals, but that pattern’s a bit more complicated than I was hoping. Materialized Paths didn’t fit the bill either. I finally settled on Closure Tables, as described by Bill Karwin.
The only time I got tripped up a little with closure tables was when I needed to select all nodes in a tree AND display them in the correct order.
Consider the following table of categories:
The closure table to describe our tree will look like this (I include the fictional root node id 0):
So how do we select the whole tree of invertebrates? Easy:
SELECT c.* FROM category c, closure cl WHERE cl.child = c.id AND cl.parent = 2
Hmmmnm. They’re out of order. We want Spiders, Insects, and Butterflies to come right after their ancestor, Arthropods. Change the 2 to 0 in the previous to get the entire tree, and you’ll notice the problem’s even more obvious there.
There is no way to immediately query a closure table to sort rows by relationship. Fortunately, it’s very easy to generate a string of the ancestry, i.e., a materialized path. We’ll just join in the closure table a second time, use
GROUP_CONCAT() to generate the path, and then order by path:
SELECT c.*, GROUP_CONCAT(cl2.parent ORDER BY cl2.depth DESC SEPARATOR ".") path FROM category c JOIN closure cl ON cl.child = c.id JOIN closure cl2 ON cl2.child = cl.child WHERE cl.parent = 2 GROUP BY c.id ORDER BY path
Try it on the whole tree (parent = 0):