Containers
by David Thomas — 8th December 2016
"Containers" is
two things
- A library of associative array data structures
- A regularised interface to those handlers
Impetus
- I wanted a data structure that factored out common pathnames
- My image viewer's tagging system maps filenames to tags
- It may store hundreds of full paths like:
-
ADFS::HardDisc4.$.Images.Fred ADFS::HardDisc4.$.Images.Jim ADFS::HardDisc4.$.Images.Sheila
-
- Wasteful - How can I factor out all the directory names?
Learn About The Trees!
So I went off down the rabbit hole and researched and wrote a load of tree stuff unrelated to the original inquiry
Oops
Trees Galore
Containers implements:
- Binary Search Tree
- Digital Search Tree
- Trie
- Crit-bit
- PATRICIA
Let's have a look...
Binary Search Tree
Invented: 1960
Uses a compare callback
Compares the whole entry
Chooses left/right from compare result
Pros
Pretty simple
Cons
Quality of tree depends on insertion order:
inserting in-order data will lead to a lopsided tree
Digital Search Tree
Invented: ?
Examines keys bitwise to decide how to branch
First bit at first level, second bit at second level, & so on
While bits of keys are equal follows the path
Single type of tree node
Pros
Relatively simple
Trie
Invented: 1959
Name comes from "reTRIEval", pronunciation...
Keeps the keys in the tree in order
Two types of node: internal and external
Internal nodes differentiate key bits
External nodes (leaves) hold the data
Cons
Creates a node per differing key bit - lots of internal nodes
Crit-Bit
Much like a trie but avoids profligate internal nodes by counting bits
Has internal and external nodes
https://www.imperialviolet.org/binary/critbit.pdf
PATRICIA
Invented: 1968
A compact representation of a trie in which any node that is an only child is merged with an ancestor
A crit-bit tree but folded in on itself
Does away with the two node types:
Single type of node that's used in two ways
Cons
Very difficult to grok
I totally failed to write the delete function
It Gets Confusing
I'm not entirely convinced that current descriptions on Wikipedia match the trees I've described, or indeed anyone else's idea
Containers (1)
- We've now established that binary trees are very cool
- But their interfaces aren't so regular
- Ideally I'd have a regular interface so I can apply identical tests to all data structures
- How difficult would this be in C?
- It might be easier in C++
- But C++ can get stuffed
Containers (2)
- Defined icontainer interface
- A struct of function pointers
- For each datastruct write a function which creates icontainer, populates it and returns it
- Different datastructs have different requirements
- some want key lengths
- some want key compare callbacks
It's Not Just Trees
- In addition to the binary tree data structures I've written implementations of:
- linked list
- ordered array
- hash
- Handy for performance comparisons
- Containers interface is sufficiently regular to support these too
Tests
- With Containers interfaces tests can now be built
- Of course I didn't wait until the end to write all of the tests - rewrote the original test to use Containers
- Test data you've seen already
- Four tests:
- char test
- int test
- string test
- common prefix string test
- Tests output to Graphviz .dot files
- I convert these to PDF to 'animate' them
References
https://github.com/dpt/Containers
Sedgewick: Algorithms
http://algs4.cs.princeton.edu/home/
Containers
By David Thomas
Containers
An exploration of some binary trees I coded up.
- 2,236