Tries

David Anderson

Recurse Center W1 '16

Tries

  • From Retrieval
  • Symbol Table (Similar API to Binary Search Trees (BST) and Hash Tables)
  • Sometimes called prefix trees
  • Easily search for values stored for alphanumeric keys - often without examining the entire key!

R-way Tries

Simplest implementation.

 

Key is single value of the alphabet.

Each node has R children, where R is the number of possible values (null nodes not pictured).

 

Can be costly for space complexity (think Unicode - 65, 536-way Trie!)

Symbol Table API

Operation Returns Description
put(key, val) N/A add a new value for given key
get(key) value retrieve value paired with key
delete(key) N/A delete key and corresponding value
keys() iterable of keys all keys
keysWithPrefix(s) iterable of keys keys having s in the beginning
keysThatMatch(s) iterable of keys keys that match s (wildcards possible)
longestPrefixOf(s) key longest key that is a starts with s

Time Complexity

implementation search hit search miss insert space (references)
red-black BST L+ c lg^2 N c lg^2 N c lg^2 N 4 N
hasing (linear probing) L L L 4N to 16N
R-way trie L logR N L (R + 1) N
TST L + ln N ln N L + ln N N
TST w/ R^2 Root + ln N ln N L + ln N N + R^2

Uses

  • Word prediction (keyword completion, T9 texting)
  • Prefix Matching, longest prefix (ex. Computational Biology databases (BLAST, FASTA), network search, IP routing, XML search)

Variants - Ternary Search Tries (TST)

  • More efficient memory usage
  • Each node has 3 children.
  • Key for each entry is single value.

TST w/ R^2 Branching At Root

  • Parent node has R^2 children (every combination of 2 letters from the key
  • Improve memory usage through de-duplication

Advanced Variants

  • PATRICIA trie aka crit-bit tree or radix tree
    • Practical Algorithm to Retrieve Information Coded in Alphanumeric (phew)
    • Remove one-way branching, each node represents sequence of characters
  • Suffix Tree
    • Patricia Tree of suffixes for string (rather than prefixes).
    • Locate substrings quickly, matches for regular expressions, linear time longest common substring.
    • Tradeoff on storage.

References

Made with Slides.com