Containers

 

by David Thomas — 8th December 2016

"Containers" is
two things

  1. A library of associative array data structures
  2. A regularised interface to those handlers

Impetus

  • I wanted a data structure that factored out common pathnames
  • My image viewer's tagging system maps filenames to tags
  • It may store hundreds of full paths like:
    • ADFS::HardDisc4.$.Images.Fred ADFS::HardDisc4.$.Images.Jim ADFS::HardDisc4.$.Images.Sheila
  • Wasteful - How can I factor out all the directory names?

Learn About The Trees!

So I went off down the rabbit hole and researched and wrote a load of tree stuff unrelated to the original inquiry

 

Oops

 

Trees Galore

Containers implements:

  • Binary Search Tree
  • Digital Search Tree
  • Trie
  • Crit-bit
  • PATRICIA

 

Let's have a look...

Binary Search Tree

Invented: 1960

Uses a compare callback

Compares the whole entry

Chooses left/right from compare result

Pros

Pretty simple

Cons

Quality of tree depends on insertion order:
inserting in-order data will lead to a lopsided tree

Digital Search Tree

Invented: ?

Examines keys bitwise to decide how to branch

First bit at first level, second bit at second level, & so on

While bits of keys are equal follows the path

Single type of tree node

Pros

Relatively simple

 

Trie

Invented: 1959

Name comes from "reTRIEval", pronunciation...

Keeps the keys in the tree in order

Two types of node: internal and external

Internal nodes differentiate key bits

External nodes (leaves) hold the data

Cons

Creates a node per differing key bit - lots of internal nodes

Crit-Bit

Much like a trie but avoids profligate internal nodes by counting bits

Has internal and external nodes

 

https://www.imperialviolet.org/binary/critbit.pdf

PATRICIA

Invented: 1968

A compact representation of a trie in which any node that is an only child is merged with an ancestor

A crit-bit tree but folded in on itself

Does away with the two node types:

Single type of node that's used in two ways

Cons

Very difficult to grok

I totally failed to write the delete function

It Gets Confusing

I'm not entirely convinced that current descriptions on Wikipedia match the trees I've described, or indeed anyone else's idea

Containers (1)

  • We've now established that binary trees are very cool
  • But their interfaces aren't so regular
  • Ideally I'd have a regular interface so I can apply identical tests to all data structures
  • How difficult would this be in C?
    • It might be easier in C++
    • But C++ can get stuffed

Containers (2)

  • Defined icontainer interface
    • A struct of function pointers
  • For each datastruct write a function which creates icontainer, populates it and returns it
  • Different datastructs have different requirements
    • some want key lengths
    • some want key compare callbacks

It's Not Just Trees

  • In addition to the binary tree data structures I've written implementations of:
    • linked list
    • ordered array
    • hash
  • Handy for performance comparisons
  • Containers interface is sufficiently regular to support these too

Tests

  • With Containers interfaces tests can now be built
  • Of course I didn't wait until the end to write all of the tests - rewrote the original test to use Containers
  • Test data you've seen already
  • Four tests:
    • char test
    • int test
    • string test
    • common prefix string test
  • ​Tests output to Graphviz .dot files
  • I convert these to PDF to 'animate' them

References

https://github.com/dpt/Containers

 

Sedgewick: Algorithms

http://algs4.cs.princeton.edu/home/

Made with Slides.com