Sorting Children by Hierarchy with Closure Tables

02 Jul 2013 Joshua Chamberlain

I recently ran into a problem quite familiar to developers: how to efficiently store and query a complex hierarchy several levels deep? Unfortunately, this was for a well-established project built on MySQL, so the recursion and Common Table Expressions of SQL Server or PostgreSQL weren’t options.

I’ve worked with simple trees where a lowly Adjacency List pattern was sufficient, but this one could potentially grow very large over time and thus needed something more robust. Nested Sets seems to be all the rage, but it’s so terribly inefficient at writing. I also looked at its more complex but speedy cousin, Nested Intervals, but that pattern’s a bit more complicated than I was hoping. Materialized Paths didn’t fit the bill either. I finally settled on Closure Tables, as described by Bill Karwin.

The only time I got tripped up a little with closure tables was when I needed to select all nodes in a tree AND display them in the correct order.

Consider the following table of categories:

id name
1 Vertebrates
2 Invertebrates
3 Mammals
4 Dogs
5 Cats
6 Reptiles
7 Fish
8 Arthropods
9 Cnidarians
10 Spiders
11 Insects
12 Butterflies

The closure table to describe our tree will look like this (I include the fictional root node id 0):

parent child depth
0 1 1
0 2 1
0 3 2
0 4 3
0 5 3
0 6 2
0 7 2
0 8 2
0 9 2
0 10 3
0 11 3
0 12 4
1 1 0
1 3 1
1 4 2
1 5 2
1 6 1
1 7 1
2 2 0
2 8 1
2 9 1
2 10 2
2 11 2
2 12 3
3 3 0
3 4 1
3 5 1
4 4 0
5 5 0
6 6 0
7 7 0
8 8 0
8 10 1
8 11 1
8 12 2
9 9 0
10 10 0
11 11 0
11 12 1
12 12 0

So how do we select the whole tree of invertebrates? Easy:

SELECT c.*
FROM category c, closure cl
WHERE cl.child = c.id AND cl.parent = 2

Result:

id name
2 Invertebrates
8 Arthropods
9 Cnidarians
10 Spiders
11 Insects
12 Butterflies

Hmmmnm. They’re out of order. We want Spiders, Insects, and Butterflies to come right after their ancestor, Arthropods. Change the 2 to 0 in the previous to get the entire tree, and you’ll notice the problem’s even more obvious there.

There is no way to immediately query a closure table to sort rows by relationship. Fortunately, it’s very easy to generate a string of the ancestry, i.e., a materialized path. We’ll just join in the closure table a second time, use GROUP_CONCAT() to generate the path, and then order by path:

SELECT c.*, GROUP_CONCAT(cl2.parent ORDER BY cl2.depth DESC SEPARATOR ".") path
FROM category c
JOIN closure cl ON cl.child = c.id
JOIN closure cl2 ON cl2.child = cl.child
WHERE cl.parent = 2
GROUP BY c.id
ORDER BY path

The result:

id name path
2 Invertebrates 0.2
8 Arthropods 0.2.8
10 Spiders 0.2.8.10
11 Insects 0.2.8.11
12 Butterflies 0.2.8.11.12
9 Cnidarians 0.2.9

Try it on the whole tree (parent = 0):

id name path
1 Vertebrates 0.1
3 Mammals 0.1.3
4 Dogs 0.1.3.4
5 Cats 0.1.3.5
6 Reptiles 0.1.6
7 Fish 0.1.7
2 Invertebrates 0.2
8 Arthropods 0.2.8
10 Spiders 0.2.8.10
11 Insects 0.2.8.11
12 Butterflies 0.2.8.11.12
9 Cnidarians 0.2.9

Perfect.