CPSC 331: Tutorial 11
Hashing, Collision Resolution and other stuff
PhD Student
Spring 2018
Today
Review, homework help, hot button issues on HW2 (time permitting).
Hash tables
We've seen arrays and lists, these have linear search times, and if you're lucky \(O(log_2(n))\) search time (i.e. a sorted list).
This isn't really that good.
Computers (at least in this decade) have a lot of memory.
DRAM shortages aside, memory is cheap. Can use more memory but get faster data look-ups?
Yes, this is what hash tables do.
Hash tables
null | null | null | null | null | null | null | null | null | null |
---|
In practice, hash tables are just arrays.
// In Java, this might look like
Object []hashTable = new Object[hashTableSize];
\(hashTable=\)
We need to make an assumption about objects here, we assume they're hashable.
Objects that are hashable override the hashCode method. If you want to hash an object you defined, you have to implement this method.
Hash tables
null | null | null | null | null | null | null | null | null |
---|
// In Java, this might look like
Object []hashTable = new Object[hashTableSize];
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Issac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan";
hashTable[str1.hashCode() % 10] = str1;
\(hashTable=\)
"Katherine" (String Object)
Hash tables
null | null | null | null | null | null | null | null |
---|
// In Java, this might look like
Object []hashTable = new Object[hashTableSize];
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan";
hashTable[str1.hashCode() % 10] = str1;
hashTable[str2.hashCode() % 10] = str2;
\(hashTable=\)
"Katherine"
"Isaac"
Why is this useful?
Hash tables
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine"
"Isaac"
We use this as a means of creating a mapping between objects. Lookups in this mapping are fast.
We mapped a name to itself, we could also map it to data. An example is a cache.
Hash tables
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine.html" + file data
"Isaac.html" + file data
Think of an HTTP server. If we request "Isaac.html":
- we the hashCode() of "Isaac.html":
- check to see if Isaac.html is in the hash table.
- If it is, we return the data in the table
- If it isn't we read the file add it to the hash table
This is faster than reading a file from a hard drive
Hash tables
null | null | null | null | null | null | null | null |
---|
// In Java, this might look like
Object []hashTable = new Object[hashTableSize];
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan"; // str3.hashCode() % 10 = 1
hashTable[str1.hashCode() % 10] = str1;
hashTable[str2.hashCode() % 10] = str2;
hashTable[str3.hashCode() % 10] = str3;
\(hashTable=\)
"Katherine"
"Isaac"
These occupy the same spot in the table? It's a collision.
"Yohan"?
Hash tables
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine"
"Isaac"
We search for the next open spot in the list and place the object there, this is linear probing
"Yohan"?
null | null | null | null | null | null | null |
---|
"Katherine"
"Isaac"
"Yohan"
Hash tables
We search for the next open spot in the list and place the object there, this is linear probing
null | null | null | null | null | null | null |
---|
"Katherine"
"Isaac"
"Yohan"
For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)
Successful
Unsuccessful
\(\frac{1}{2}\left(1 - \frac{1}{1-\alpha} \right) \)
\(\frac{1}{2}\left(1 - \frac{1}{(1-\alpha)^2} \right) \)
Avg # of probes during a search
See lecture slides for details
Hash tables
This tends to cluster entries in the hash table.
null | ? | ? | null | null | null | null |
---|
"Katherine"
"Isaac"
"Yohan"
For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)
Successful
Unsuccessful
\(\frac{1}{2}\left(1 - \frac{1}{1-\alpha} \right) \)
\(\frac{1}{2}\left(1 - \frac{1}{(1-\alpha)^2} \right) \)
Avg # of probes during a search
Hash tables
Instead of trying to insert into the H(str) + 1, H(str)+2, H(str)+3... We insert quadratically, i.e. H(str) + \(1^2\), H(str)-\(2^2\), H(str)+\(3^2\)...
For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)
Successful
Unsuccessful
\(1 - ln(1-\alpha) - \frac{\alpha}{2}\)
\(\frac{1}{1-\alpha} - \alpha - ln(1-\alpha)\)
Avg # of probes during a search
This is quadratic probing
Notice the - sign, we alternate between +/-
Hash tables
Instead of trying to insert into the \(H\)(str) + 1, \(H\)(str)+2, \(H\)(str)+3... We use a second hash function \(H_2\) and try to insert at \(H\)(str) + \(H_2\)(str), \(H\)(str) + 2\(H_2\)(str), \(H\)(str) + 3\(H_2\)(str)...
For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)
Successful
Unsuccessful
\(\frac{1}{\alpha}ln\left(\frac{1}{1-\alpha}\right)\)
\(\frac{1}{1-\alpha} \)
Avg # of probes during a search
This is double hashing
Hash tables
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine"
"Isaac"
Don't worry about probing, so keep entries in a linked list
"Yohan"
Hash tables
// In Java, this might look like
ArrayList<LinkedList<Object>> hashTable =
new ArrayList<LinkedList<Object>>(10);
for (int i = 0; i < 10; i++) {
hashTable.add(new LinkedList<Object>());
}
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan"; // str3.hashCode() % 10 = 1
hashTable.get(str1.hashCode() % 10).add(str1);
hashTable.get(str2.hashCode() % 10).add(str2);
hashTable.get(str3.hashCode() % 10).add(str3);
This is with a linked list
Hash tables
// In Java, this might look like
ArrayList<ArrayList<Object>> hashTable =
new ArrayList<ArrayList<Object>>(10);
for (int i = 0; i < 10; i++) {
hashTable.add(new ArrayList<Object>());
}
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan"; // str3.hashCode() % 10 = 1
hashTable.get(str1.hashCode() % 10).add(str1);
hashTable.get(str2.hashCode() % 10).add(str2);
hashTable.get(str3.hashCode() % 10).add(str3);
This is with a resizable list
Hash tables
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine"
"Isaac"
"Yohan"
If we're searching for something in the hash table, we hash the object, then search the list. This is done in \(\Theta(1+\alpha)\) time. What if this list was sorted? What about deletion?
Heaps
A heap is a binary tree that is perfectly balanced, with the property that a node is greater than all of its children (max heap)
10
3
5
1
2
As an array this looks like A=[10,3,5,1,2]
Heapsort
Take the top of the tree off the heap, swap it with the last element, and reheap-ify
10
3
5
1
2
As an array we swap with the last element, and reheap-ify on a smaller list.
Heapsort
Take the top of the tree off the heap, swap it with the last element, and reheap-ify
10
3
5
1
2
As an array this looks like A=[2,3,5,1,10]
Heapsort
Take the top of the tree off the heap, swap it with the last element, and reheap-ify
10
3
5
1
2
As an array this looks like A=[5,2,3,1,10]
Heapsort
Take the top of the tree off the heap, swap it with the last element, and reheap-ify
10
3
5
1
2
As an array this looks like A=[1,2,3,5,10]
Heapsort
Keep doing this... (our sorted array grows from n-1 down to 0)
10
3
5
1
2
As an array this looks like A=[1,2,3,5,10]
CPSC 331: Tutorial 11
By Joshua Horacsek
CPSC 331: Tutorial 11
- 1,086