Hashing, Collision Resolution and other stuff
PhD Student
Spring 2018
Review, homework help, hot button issues on HW2 (time permitting).
We've seen arrays and lists, these have linear search times, and if you're lucky \(O(log_2(n))\) search time (i.e. a sorted list).
This isn't really that good.
Computers (at least in this decade) have a lot of memory.
DRAM shortages aside, memory is cheap. Can use more memory but get faster data look-ups?
Yes, this is what hash tables do.
null | null | null | null | null | null | null | null | null | null |
---|
In practice, hash tables are just arrays.
// In Java, this might look like
Object []hashTable = new Object[hashTableSize];
\(hashTable=\)
We need to make an assumption about objects here, we assume they're hashable.
Objects that are hashable override the hashCode method. If you want to hash an object you defined, you have to implement this method.
null | null | null | null | null | null | null | null | null |
---|
// In Java, this might look like
Object []hashTable = new Object[hashTableSize];
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Issac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan";
hashTable[str1.hashCode() % 10] = str1;
\(hashTable=\)
"Katherine" (String Object)
null | null | null | null | null | null | null | null |
---|
// In Java, this might look like
Object []hashTable = new Object[hashTableSize];
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan";
hashTable[str1.hashCode() % 10] = str1;
hashTable[str2.hashCode() % 10] = str2;
\(hashTable=\)
"Katherine"
"Isaac"
Why is this useful?
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine"
"Isaac"
We use this as a means of creating a mapping between objects. Lookups in this mapping are fast.
We mapped a name to itself, we could also map it to data. An example is a cache.
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine.html" + file data
"Isaac.html" + file data
Think of an HTTP server. If we request "Isaac.html":
This is faster than reading a file from a hard drive
null | null | null | null | null | null | null | null |
---|
// In Java, this might look like
Object []hashTable = new Object[hashTableSize];
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan"; // str3.hashCode() % 10 = 1
hashTable[str1.hashCode() % 10] = str1;
hashTable[str2.hashCode() % 10] = str2;
hashTable[str3.hashCode() % 10] = str3;
\(hashTable=\)
"Katherine"
"Isaac"
These occupy the same spot in the table? It's a collision.
"Yohan"?
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine"
"Isaac"
We search for the next open spot in the list and place the object there, this is linear probing
"Yohan"?
null | null | null | null | null | null | null |
---|
"Katherine"
"Isaac"
"Yohan"
We search for the next open spot in the list and place the object there, this is linear probing
null | null | null | null | null | null | null |
---|
"Katherine"
"Isaac"
"Yohan"
For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)
Successful
Unsuccessful
\(\frac{1}{2}\left(1 - \frac{1}{1-\alpha} \right) \)
\(\frac{1}{2}\left(1 - \frac{1}{(1-\alpha)^2} \right) \)
Avg # of probes during a search
See lecture slides for details
This tends to cluster entries in the hash table.
null | ? | ? | null | null | null | null |
---|
"Katherine"
"Isaac"
"Yohan"
For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)
Successful
Unsuccessful
\(\frac{1}{2}\left(1 - \frac{1}{1-\alpha} \right) \)
\(\frac{1}{2}\left(1 - \frac{1}{(1-\alpha)^2} \right) \)
Avg # of probes during a search
Instead of trying to insert into the H(str) + 1, H(str)+2, H(str)+3... We insert quadratically, i.e. H(str) + \(1^2\), H(str)-\(2^2\), H(str)+\(3^2\)...
For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)
Successful
Unsuccessful
\(1 - ln(1-\alpha) - \frac{\alpha}{2}\)
\(\frac{1}{1-\alpha} - \alpha - ln(1-\alpha)\)
Avg # of probes during a search
This is quadratic probing
Notice the - sign, we alternate between +/-
Instead of trying to insert into the \(H\)(str) + 1, \(H\)(str)+2, \(H\)(str)+3... We use a second hash function \(H_2\) and try to insert at \(H\)(str) + \(H_2\)(str), \(H\)(str) + 2\(H_2\)(str), \(H\)(str) + 3\(H_2\)(str)...
For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)
Successful
Unsuccessful
\(\frac{1}{\alpha}ln\left(\frac{1}{1-\alpha}\right)\)
\(\frac{1}{1-\alpha} \)
Avg # of probes during a search
This is double hashing
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine"
"Isaac"
Don't worry about probing, so keep entries in a linked list
"Yohan"
// In Java, this might look like
ArrayList<LinkedList<Object>> hashTable =
new ArrayList<LinkedList<Object>>(10);
for (int i = 0; i < 10; i++) {
hashTable.add(new LinkedList<Object>());
}
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan"; // str3.hashCode() % 10 = 1
hashTable.get(str1.hashCode() % 10).add(str1);
hashTable.get(str2.hashCode() % 10).add(str2);
hashTable.get(str3.hashCode() % 10).add(str3);
This is with a linked list
// In Java, this might look like
ArrayList<ArrayList<Object>> hashTable =
new ArrayList<ArrayList<Object>>(10);
for (int i = 0; i < 10; i++) {
hashTable.add(new ArrayList<Object>());
}
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac"; // str2.hashCode() % 10 = 1
String str3 = "Yohan"; // str3.hashCode() % 10 = 1
hashTable.get(str1.hashCode() % 10).add(str1);
hashTable.get(str2.hashCode() % 10).add(str2);
hashTable.get(str3.hashCode() % 10).add(str3);
This is with a resizable list
null | null | null | null | null | null | null | null |
---|
\(hashTable=\)
"Katherine"
"Isaac"
"Yohan"
If we're searching for something in the hash table, we hash the object, then search the list. This is done in \(\Theta(1+\alpha)\) time. What if this list was sorted? What about deletion?
A heap is a binary tree that is perfectly balanced, with the property that a node is greater than all of its children (max heap)
10
3
5
1
2
As an array this looks like A=[10,3,5,1,2]
Take the top of the tree off the heap, swap it with the last element, and reheap-ify
10
3
5
1
2
As an array we swap with the last element, and reheap-ify on a smaller list.
Take the top of the tree off the heap, swap it with the last element, and reheap-ify
10
3
5
1
2
As an array this looks like A=[2,3,5,1,10]
Take the top of the tree off the heap, swap it with the last element, and reheap-ify
10
3
5
1
2
As an array this looks like A=[5,2,3,1,10]
Take the top of the tree off the heap, swap it with the last element, and reheap-ify
10
3
5
1
2
As an array this looks like A=[1,2,3,5,10]
Keep doing this... (our sorted array grows from n-1 down to 0)
10
3
5
1
2
As an array this looks like A=[1,2,3,5,10]