Ruling the hierarchy
Storing and querying hierarchical data in a relational database
Philip Brown
- twitter.com/philipbrown
- github.com/philipbrown

culttt.com
Overview
- The Problem
- The Options
- The Solution
The Problem
Papertrail is a SaaS application that helps companies comply with health and safety legislation
We allow users to organise their data into hierarchical folders

An account's folder structure is not limited in breadth or depth
We need to be able to pull the entire folder structure out in a single query
It should be quick and easy to add a new folder at any position in the hierarchy
We need to allow single folders, or whole subsections, to be moved within the hierarchy
We need to be able to get the parent of a given folder
We need to be able to retrieve the children of a folder
We need to be able to retrieve the siblings of a folder
We need to be able to retrieve all descendants of a folder
We need to be able to retrieve all ancestors of a folder
We also need to deal with folder permissions and "starred" folders
The Options
We looked at...
- Adjacency List
- Nested Sets
- Closure Table
Each has positives and negatives
You need to pick the appropriate solution for your requirements
Adjacency List
| id | parent_id | name |
|---|---|---|
| 1 | null | Acme Adventure |
| 2 | 1 | Outdoors |
| 3 | 2 | Rock Climbing |
| 4 | 3 | Helmets |
| 5 | 1 | Equipment |
| 6 | 5 | Safety |
- The naive method for storing a hierarchy
- Every item knows its parent
Adjacency List
Adjacency List
- Easy to add a new item
- Easy to move items
- Easy to get the immediate parent
- Easy to get the immediate children
Adjacency List
- Difficult to handle hierarchies of unlimited depth
When to use Adjacency List?
- You need a simple solution
- You don't need a deep hierarchy
Nested Sets
Nested Sets
- Each item has two numbers
- Left number is less than all descendants
- Right number is greater than all descendants
Nested Sets
Acme Adventure
Outdoors
Rock Climbing
Helmets
Equipment
Safety
1
2
3
4
5
6
7
8
9
10
11
12
Nested Sets
| id | left | right | name |
|---|---|---|---|
| 1 | 1 | 12 | Acme Adventure |
| 2 | 2 | 7 | Outdoors |
| 3 | 3 | 6 | Rock Climbing |
| 4 | 4 | 5 | Helmets |
| 5 | 8 | 11 | Equipment |
| 6 | 9 | 10 | Safety |
Nested Sets
Ancestors
Select items where the numbers are outside the numbers of the given item
Descendants
Select items where the numbers are between the numbers of the given item
Nested Sets
Add new item
Recalculate all left values for all items on the right of the new item. Recalculate all right values for all items above and to the right of the new item.
Finding the immediate parent or children
The parent of an item is the ancestor who has no descendant who is also an ancestor of that item
When to use Nested Sets?
- You need to query ancestors and descendants
- The hierarchy won't change very often
Closure Table
Closure Table
- Uses a second table to store the paths
- Store every path from each item to it's descendants
- Each item is also connected to itself
| id | name |
|---|---|
| 1 | Acme Adventure |
| 2 | Outdoors |
| 3 | Rock Climbing |
| ancestor | descendant |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 1 | 2 |
| 3 | 3 |
| 1 | 3 |
| 2 | 3 |
Closure Table
folders
paths
Closure Table
- Querying ancestors and descendants is simple
- Inserting a new item is simple
- Deleting paths is simple
- Finding the immediate parent or children is simple
- Moving an item requires two (fairly complex) queries
Closure Table
- Requires two tables
The Solution
We chose Closure Tables for the following reasons
- Query the ancestors and descendants
- Find the immediate parent and children
- Move folders or whole subtrees easily
- Having two tables was an acceptable compromise
We use Laravel and Eloquent at Papertrail
github.com/franzose/ClosureTable
However we decided not to use it
Instead it was implemented using simple queries that can be run during each procedure
Example Code
github.com/philipbrown/ruling-the-hierarchy
$conn = new PDO('mysql...');
$root = Folder::create($conn, 'Acme Adventure');
$parent = Folder::create($conn, 'Outdoors', $root->id);/**
* Create a new Folder
*
* @param PDO $conn
* @param int $parent_id
* @return Folder
*/
public static function create(PDO $conn, $name, $parent_id = null)
{
// Insert the row
$statement = $conn->prepare('
INSERT INTO folders (name, parent_id) VALUES(?, ?)');
$statement->execute([$name, $parent_id]);
// Get the id
$id = $conn->lastInsertId();
// Create the folder
$folder = new self($conn);
$folder->id = $id;
$folder->name = $name;
$folder->parent_id = $parent_id;
// Set the paths
$folder->setFolderPaths();
return $folder;
}/**
* Set the folder paths
*
* @return void
*/
private function setFolderPaths()
{
$descendant = $this->id;
$ancestor = $this->parent_id ? $this->parent_id : $descendant;
$statement = $this->conn->prepare('
INSERT INTO paths (ancestor_id, descendant_id, depth)
SELECT ancestor_id, ?, depth+1
FROM paths
WHERE descendant_id = ?
UNION ALL SELECT ?, ?, 0
');
$statement->execute([$descendant, $ancestor, $descendant, $descendant]);
}$root = Folder::create($conn, 'Root');
$parent = Folder::create($conn, 'Parent', $root->id);
$child = Folder::create($conn, 'Child', $parent->id);
$child->delete();/**
* Delete the Folder
*
* @return void
*/
public function delete()
{
$this->deleteFolderPaths();
$statement = $this->conn->prepare('DELETE FROM folders WHERE id = ?');
$statement->execute([$this->id]);
}/**
* Delete the folder paths
*
* @return void
*/
private function deleteFolderPaths()
{
$statement = $this->conn->prepare('
DELETE FROM paths
WHERE descendant_id IN (
SELECT descendant_id FROM (
SELECT descendant_id FROM paths
WHERE ancestor_id = ?
) as tmptable
)
');
$statement->execute([$this->id]);
}$root = Folder::create($conn, 'Acme Adventures');
$grandparent1 = Folder::create($conn, 'Outdoors', $root->id);
$grandparent2 = Folder::create($conn, 'Equipment', $root->id);
$parent1 = Folder::create($conn, 'Rock Climbing', $grandparent1->id);
$parent2 = Folder::create($conn, 'Safety', $grandparent2->id);
$child = Folder::create($conn, 'Helmets', $parent1->id);
$child->move($parent2->id);/**
* Move the Folder to a new parent
*
* @param string $parent_id
* @return void
*/
public function move($parent_id)
{
$statement = $this->conn->prepare('
UPDATE folders SET parent_id = ? WHERE id = ?');
$statement->execute([$parent_id, $this->id]);
$this->parent_id = $parent_id;
$this->resetFolderPaths();
}/**
* Reset the folder paths
*
* @return void
*/
private function resetFolderPaths()
{
$statement = $this->conn->prepare('
DELETE FROM paths
WHERE descendant_id IN (
SELECT d FROM (
SELECT descendant_id as d FROM paths
WHERE ancestor_id = ?
) as dct
)
AND ancestor_id IN (
SELECT a FROM (
SELECT ancestor_id AS a FROM paths
WHERE descendant_id = ?
AND ancestor_id <> ?
) as ct
)
');
$statement->execute([$this->id, $this->id, $this->id]);
$statement = $this->conn->prepare('
INSERT INTO paths (ancestor_id, descendant_id, depth)
SELECT supertbl.ancestor_id, subtbl.descendant_id, supertbl.depth+subtbl.depth+1
FROM paths as supertbl
CROSS JOIN paths as subtbl
WHERE supertbl.descendant_id = ?
AND subtbl.ancestor_id = ?
');
$statement->execute([$this->parent_id, $this->id]);
}/**
* Get the children of a Folder
*
* @return array
*/
public function children()
{
$statement = $this->conn->prepare('
SELECT * FROM folders WHERE parent_id = ?');
$statement->execute([$this->id]);
return $statement->fetchAll();
}/**
* Get the siblings of a Folder
*
* @return array
*/
public function siblings()
{
$statement = $this->conn->prepare('
SELECT * FROM folders WHERE parent_id = ? AND id != ?');
$statement->execute([$this->parent_id, $this->id]);
return $statement->fetchAll();
}/**
* Get the ancestors of a Folder
*
* @return array
*/
public function ancestors()
{
$statement = $this->conn->prepare('
SELECT folders.* FROM folders
INNER JOIN paths ON paths.ancestor_id = folders.id
WHERE paths.descendant_id = ?
AND paths.depth > 0
');
$statement->execute([$this->id]);
return $statement->fetchAll();
}/**
* Get the descendants of a Folder
*
* @return array
*/
public function descendants()
{
$statement = $this->conn->prepare('
SELECT folders.* FROM folders
INNER JOIN paths ON paths.descendant_id = folders.id
WHERE paths.ancestor_id = ?
AND paths.depth > 0
');
$statement->execute([$this->id]);
return $statement->fetchAll();
}github.com/philipbrown/ruling-the-hierarchy
Conclusion
There are a number of ways to deal with this problem
It is very important that you choose the right solution for your requirements
Closure Table has been an excellent choice at Papertrail
We have ~110k folders and ~410k paths
We have not had a single query related problem with folders
The performance has been excellent
And the code has not been touched since we launched the product
Thanks for listening!
- twitter.com/philipbrown
- github.com/philipbrown
- papertrail.io
- culttt.com
- github.com/franzose/ClosureTable
- github.com/philipbrown/ruling-the-hierarchy
Questions?
Ruling the hierarchy
By Philip Brown
Ruling the hierarchy
Storing and querying hierarchical data in a relational database
- 726