COMP2521
Data Structures & Algorithms
Week 9.3
Tries
Author: Hayden Smith 2021
In this lecture
Why?
- Storing a large set of strings naively can be costly, we need a more efficient way
What?
- Tries
- Tries insert
- Tries lookup
Tries
Tries are a data structure for representing strings that support O(L) lookup and insertion (where L is the length of a string)
Tries Concenptualised

Depth of trie = length of longest word
Tries Concenptualised

Note: Not every word here is included in the trie
Tries Structure
Each node in a trie:
- Contains one part of a key (typically one character)
- May have up to 26 children
- May be tagged as a "finishing" node
- But even "finishing" nodes may have children
- May contain other data for application (e.g. word frequency)
- A "finishing" node marks the end of one key
#define ALPHABET_SIZE 26
typedef struct Node *Trie;
typedef struct Node {
char onechar; // current char in key
Trie child[ALPHABET_SIZE];
bool finish; // last char in key?
Item data; // no Item if !finish
} Node;
typedef char *Key; // just lower-case letters
Tries Search
find(trie,key):
| Input trie, key
| Output pointer to element in trie if key found
| NULL otherwise
|
| node=trie
| for each char c in key do
| | if node.child[c] exists then
| | node=node.child[c] // move down one level
| | else
| | return NULL
| | end if
| end for
| if node.finish then // "finishing" node reached?
| return node
| else
| return NULL
| end if

Tries Insertion
Trie insert(trie, item, key):
if trie is empty then:
t = new trie node
if m = 0 then // end of key
t.finish = true
t.data = item
else:
first = key[0]
rest = key[1..m-1]
t.child[first] = insert(t.child[first], item, rest)
return t
Tries Complexity
- Space complexity: O(n)
- n = Sum of lengths of all strings
- Insertion complexity: O(m)
- m = length of the key string
- Search complexity: O(m)
BST-like Tries
Above representation is space inefficient
- Each node has 26 possible children
- Even with very many keys, most child links are unused
And if we allowed all ascii chars in alphabet, 128 children
We could reduce branching factor by reducing "alphabet"
- Break each 8-bit char into two 4-bit "nybbles"
- Branching factor is 16, even for full ascii char set
- But each branch is twice as long
BST-like Tries

Compressed Tries
- Have internal nodes of degree ≥ 2; each node contains ≥ 1 char
- Obtained by compressing non-branching chains of nodes
- Compact representation of compressed trie to encode array S of strings:
- Requires O(s) space (s = #strings in array S)

Compressed Tries

Feedback

COMP2521 21T2 - 9.3 - Tries
By haydensmith
COMP2521 21T2 - 9.3 - Tries
- 1,747