Sorting Children by Hierarchy with Closure Tables

I recently ran into a problem quite familiar to developers: how to efficiently store and query a complex hierarchy several levels deep? Unfortunately, this was for a well-established project built on MySQL, so the recursion and Common Table Expressions of SQL Server or PostgreSQL weren’t options.

I’ve worked with simple trees where a lowly Adjacency List pattern was sufficient, but this one could potentially grow very large over time and thus needed something more robust. Nested Sets seems to be all the rage, but it’s so terribly inefficient at writing. I also looked at its more complex but speedy cousin, Nested Intervals, but that pattern’s a bit more complicated than I was hoping. Materialized Paths didn’t fit the bill either. I finally settled on Closure Tables, as described by Bill Karwin.

The only time I got tripped up a little with closure tables was when I needed to select all nodes in a tree AND display them in the correct order.

Consider the following table of categories:

id	name
1	Vertebrates
2	Invertebrates
3	Mammals
4	Dogs
5	Cats
6	Reptiles
7	Fish
8	Arthropods
9	Cnidarians
10	Spiders
11	Insects
12	Butterflies

The closure table to describe our tree will look like this (I include the fictional root node id 0):

parent	child	depth
0	1	1
0	2	1
0	3	2
0	4	3
0	5	3
0	6	2
0	7	2
0	8	2
0	9	2
0	10	3
0	11	3
0	12	4
1	1	0
1	3	1
1	4	2
1	5	2
1	6	1
1	7	1
2	2	0
2	8	1
2	9	1
2	10	2
2	11	2
2	12	3
3	3	0
3	4	1
3	5	1
4	4	0
5	5	0
6	6	0
7	7	0
8	8	0
8	10	1
8	11	1
8	12	2
9	9	0
10	10	0
11	11	0
11	12	1
12	12	0

So how do we select the whole tree of invertebrates? Easy:

SELECT c.*
FROM category c, closure cl
WHERE cl.child = c.id AND cl.parent = 2

Result:

id	name
2	Invertebrates
8	Arthropods
9	Cnidarians
10	Spiders
11	Insects
12	Butterflies

Hmmmnm. They’re out of order. We want Spiders, Insects, and Butterflies to come right after their ancestor, Arthropods. Change the 2 to 0 in the previous to get the entire tree, and you’ll notice the problem’s even more obvious there.

There is no way to immediately query a closure table to sort rows by relationship. Fortunately, it’s very easy to generate a string of the ancestry, i.e., a materialized path. We’ll just join in the closure table a second time, use GROUP_CONCAT() to generate the path, and then order by path:

SELECT c.*, GROUP_CONCAT(cl2.parent ORDER BY cl2.depth DESC SEPARATOR ".") path
FROM category c
JOIN closure cl ON cl.child = c.id
JOIN closure cl2 ON cl2.child = cl.child
WHERE cl.parent = 2
GROUP BY c.id
ORDER BY path

The result:

id	name	path
2	Invertebrates	0.2
8	Arthropods	0.2.8
10	Spiders	0.2.8.10
11	Insects	0.2.8.11
12	Butterflies	0.2.8.11.12
9	Cnidarians	0.2.9

Try it on the whole tree (parent = 0):

id	name	path
1	Vertebrates	0.1
3	Mammals	0.1.3
4	Dogs	0.1.3.4
5	Cats	0.1.3.5
6	Reptiles	0.1.6
7	Fish	0.1.7
2	Invertebrates	0.2
8	Arthropods	0.2.8
10	Spiders	0.2.8.10
11	Insects	0.2.8.11
12	Butterflies	0.2.8.11.12
9	Cnidarians	0.2.9

Perfect.

parent	child	depth
0	1	1
0	2	1
0	3	2
0	4	3
0	5	3
0	6	2
0	7	2
0	8	2
0	9	2
0	10	3
0	11	3
0	12	4
1	1	0
1	3	1
1	4	2
1	5	2
1	6	1
1	7	1
2	2	0
2	8	1
2	9	1
2	10	2
2	11	2
2	12	3
3	3	0
3	4	1
3	5	1
4	4	0
5	5	0
6	6	0
7	7	0
8	8	0
8	10	1
8	11	1
8	12	2
9	9	0
10	10	0
11	11	0
11	12	1
12	12	0

parent	child	depth
0	1	1
0	2	1
0	3	2
0	4	3
0	5	3
0	6	2
0	7	2
0	8	2
0	9	2
0	10	3
0	11	3
0	12	4
1	1	0
1	3	1
1	4	2
1	5	2
1	6	1
1	7	1
2	2	0
2	8	1
2	9	1
2	10	2
2	11	2
2	12	3
3	3	0
3	4	1
3	5	1
4	4	0
5	5	0
6	6	0
7	7	0
8	8	0
8	10	1
8	11	1
8	12	2
9	9	0
10	10	0
11	11	0
11	12	1
12	12	0

parent	child	depth
0	1	1
0	2	1
0	3	2
0	4	3
0	5	3
0	6	2
0	7	2
0	8	2
0	9	2
0	10	3
0	11	3
0	12	4
1	1	0
1	3	1
1	4	2
1	5	2
1	6	1
1	7	1
2	2	0
2	8	1
2	9	1
2	10	2
2	11	2
2	12	3
3	3	0
3	4	1
3	5	1
4	4	0
5	5	0
6	6	0
7	7	0
8	8	0
8	10	1
8	11	1
8	12	2
9	9	0
10	10	0
11	11	0
11	12	1
12	12	0