Ruling the hierarchy

Storing and querying hierarchical data in a relational database

Philip Brown

  • twitter.com/philipbrown
  • github.com/philipbrown

culttt.com

Overview

  • The Problem
  • The Options
  • The Solution

The Problem

Papertrail is a SaaS application that helps companies comply with health and safety legislation

We allow users to organise their data into hierarchical folders

An account's folder structure is not limited in breadth or depth

We need to be able to pull the entire folder structure out in a single query

It should be quick and easy to add a new folder at any position in the hierarchy

We need to allow single folders, or whole subsections, to be moved within the hierarchy

We need to be able to get the parent of a given folder

We need to be able to retrieve the children of a folder

We need to be able to retrieve the siblings of a folder

We need to be able to retrieve all descendants of a folder

We need to be able to retrieve all ancestors of a folder

We also need to deal with folder permissions and "starred" folders

The Options

We looked at...

  • Adjacency List
  • Nested Sets
  • Closure Table

Each has positives and negatives

You need to pick the appropriate solution for your requirements

Adjacency List

id parent_id name
1 null Acme Adventure
2 1 Outdoors
3 2 Rock Climbing
4 3 Helmets
5 1 Equipment
6 5 Safety
  • The naive method for storing a hierarchy
  • Every item knows its parent

Adjacency List

Adjacency List

  • Easy to add a new item
  • Easy to move items
  • Easy to get the immediate parent
  • Easy to get the immediate children

Adjacency List

  • Difficult to handle hierarchies of unlimited depth

When to use Adjacency List?

  • You need a simple solution
  • You don't need a deep hierarchy

Nested Sets

Nested Sets

  • Each item has two numbers
  • Left number is less than all descendants
  • Right number is greater than all descendants

Nested Sets

Acme Adventure

Outdoors

Rock Climbing

Helmets

Equipment

Safety

1

2

3

4

5

6

7

8

9

10

11

12

Nested Sets

id left right name
1 1 12 Acme Adventure
2 2 7 Outdoors
3 3 6 Rock Climbing
4 4 5 Helmets
5 8 11 Equipment
6 9 10 Safety

Nested Sets

Ancestors

Select items where the numbers are outside the numbers of the given item

Descendants

Select items where the numbers are between the numbers of the given item

Nested Sets

Add new item

Recalculate all left values for all items on the right of the new item. Recalculate all right values for all items above and to the right of the new item.

Finding the immediate parent or children

The parent of an item is the ancestor who has no descendant who is also an ancestor of that item

When to use Nested Sets?

  • You need to query ancestors and descendants
  • The hierarchy won't change very often

Closure Table

Closure Table

  • Uses a second table to store the paths
  • Store every path from each item to it's descendants
  • Each item is also connected to itself
id name
1 Acme Adventure
2 Outdoors
3 Rock Climbing
ancestor descendant
1 1
2 2
1 2
3 3
1 3
2 3

Closure Table

folders

paths

Closure Table

  • Querying ancestors and descendants is simple
  • Inserting a new item is simple
  • Deleting paths is simple
  • Finding the immediate parent or children is simple
  • Moving an item requires two (fairly complex) queries

Closure Table

  • Requires two tables

The Solution

We chose Closure Tables for the following reasons

  • Query the ancestors and descendants
  • Find the immediate parent and children
  • Move folders or whole subtrees easily
  • Having two tables was an acceptable compromise

We use Laravel and Eloquent at Papertrail

github.com/franzose/ClosureTable

However we decided not to use it

Instead it was implemented using simple queries that can be run during each procedure

Example Code

github.com/philipbrown/ruling-the-hierarchy

$conn = new PDO('mysql...');

$root = Folder::create($conn, 'Acme Adventure');

$parent = Folder::create($conn, 'Outdoors', $root->id);
/**
 * Create a new Folder
 *
 * @param PDO $conn
 * @param int $parent_id
 * @return Folder
 */
public static function create(PDO $conn, $name, $parent_id = null)
{
    // Insert the row
    $statement = $conn->prepare('
        INSERT INTO folders (name, parent_id) VALUES(?, ?)');
    $statement->execute([$name, $parent_id]);

    // Get the id
    $id = $conn->lastInsertId();

    // Create the folder
    $folder            = new self($conn);
    $folder->id        = $id;
    $folder->name      = $name;
    $folder->parent_id = $parent_id;

    // Set the paths
    $folder->setFolderPaths();

    return $folder;
}
/**
 * Set the folder paths
 *
 * @return void
 */
private function setFolderPaths()
{
    $descendant = $this->id;

    $ancestor = $this->parent_id ? $this->parent_id : $descendant;

    $statement = $this->conn->prepare('
        INSERT INTO paths (ancestor_id, descendant_id, depth)
        SELECT ancestor_id, ?, depth+1
        FROM paths
        WHERE descendant_id = ?
        UNION ALL SELECT ?, ?, 0
    ');
    $statement->execute([$descendant, $ancestor, $descendant, $descendant]);
}
$root   = Folder::create($conn, 'Root');
$parent = Folder::create($conn, 'Parent', $root->id);
$child  = Folder::create($conn, 'Child',  $parent->id);

$child->delete();
/**
 * Delete the Folder
 *
 * @return void
 */
public function delete()
{
    $this->deleteFolderPaths();

    $statement = $this->conn->prepare('DELETE FROM folders WHERE id = ?');
    $statement->execute([$this->id]);
}
/**
 * Delete the folder paths
 *
 * @return void
 */
private function deleteFolderPaths()
{
    $statement = $this->conn->prepare('
        DELETE FROM paths
        WHERE descendant_id IN (
            SELECT descendant_id FROM (
                SELECT descendant_id FROM paths
                WHERE ancestor_id = ?
            ) as tmptable
        )
    ');

    $statement->execute([$this->id]);
}
$root         = Folder::create($conn, 'Acme Adventures');
$grandparent1 = Folder::create($conn, 'Outdoors', $root->id);
$grandparent2 = Folder::create($conn, 'Equipment', $root->id);
$parent1      = Folder::create($conn, 'Rock Climbing', $grandparent1->id);
$parent2      = Folder::create($conn, 'Safety', $grandparent2->id);
$child        = Folder::create($conn, 'Helmets', $parent1->id);

$child->move($parent2->id);
/**
 * Move the Folder to a new parent
 *
 * @param string $parent_id
 * @return void
 */
public function move($parent_id)
{
    $statement = $this->conn->prepare('
        UPDATE folders SET parent_id = ? WHERE id = ?');

    $statement->execute([$parent_id, $this->id]);

    $this->parent_id = $parent_id;

    $this->resetFolderPaths();
}
/**
 * Reset the folder paths
 *
 * @return void
 */
private function resetFolderPaths()
{
    $statement = $this->conn->prepare('
        DELETE FROM paths
        WHERE descendant_id IN (
            SELECT d FROM (
                SELECT descendant_id as d FROM paths
                WHERE ancestor_id = ?
            ) as dct
        )
        AND ancestor_id IN (
            SELECT a FROM (
                SELECT ancestor_id AS a FROM paths
                WHERE descendant_id = ?
                AND ancestor_id <> ?
            ) as ct
        )
    ');
    $statement->execute([$this->id, $this->id, $this->id]);

    $statement = $this->conn->prepare('
        INSERT INTO paths (ancestor_id, descendant_id, depth)
        SELECT supertbl.ancestor_id, subtbl.descendant_id, supertbl.depth+subtbl.depth+1
        FROM paths as supertbl
        CROSS JOIN paths as subtbl
        WHERE supertbl.descendant_id = ?
        AND subtbl.ancestor_id = ?
    ');

    $statement->execute([$this->parent_id, $this->id]);
}
/**
 * Get the children of a Folder
 *
 * @return array
 */
public function children()
{
    $statement = $this->conn->prepare('
        SELECT * FROM folders WHERE parent_id = ?');

    $statement->execute([$this->id]);

    return $statement->fetchAll();
}
/**
 * Get the siblings of a Folder
 *
 * @return array
 */
public function siblings()
{
    $statement = $this->conn->prepare('
        SELECT * FROM folders WHERE parent_id = ? AND id != ?');

    $statement->execute([$this->parent_id, $this->id]);

    return $statement->fetchAll();
}
/**
 * Get the ancestors of a Folder
 *
 * @return array
 */
public function ancestors()
{
    $statement = $this->conn->prepare('
        SELECT folders.* FROM folders
        INNER JOIN paths ON paths.ancestor_id = folders.id
        WHERE paths.descendant_id = ?
        AND paths.depth > 0
    ');

    $statement->execute([$this->id]);

    return $statement->fetchAll();
}
/**
 * Get the descendants of a Folder
 *
 * @return array
 */
public function descendants()
{
    $statement = $this->conn->prepare('
        SELECT folders.* FROM folders
        INNER JOIN paths ON paths.descendant_id = folders.id
        WHERE paths.ancestor_id = ?
        AND paths.depth > 0
    ');

    $statement->execute([$this->id]);

    return $statement->fetchAll();
}

github.com/philipbrown/ruling-the-hierarchy

Conclusion

There are a number of ways to deal with this problem

It is very important that you choose the right solution for your requirements

Closure Table has been an excellent choice at Papertrail

We have ~110k folders and ~410k paths

We have not had a single query related problem with folders

The performance has been excellent

And the code has not been touched since we launched the product

Thanks for listening!

  • twitter.com/philipbrown
  • github.com/philipbrown
  • papertrail.io
  • culttt.com
  • github.com/franzose/ClosureTable
  • github.com/philipbrown/ruling-the-hierarchy

Questions?

Made with Slides.com