CPSC 331: Tutorial 11

Hashing, Collision Resolution and other stuff

PhD Student

Spring 2018

Today

Review, homework help, hot button issues on HW2 (time permitting).

Hash tables

We've seen arrays and lists, these have linear search times, and if you're lucky \(O(log_2(n))\) search time (i.e. a sorted list).

This isn't really that good.

Computers (at least in this decade) have a lot of memory.

DRAM shortages aside, memory is cheap. Can use more memory but get faster data look-ups?

Yes, this is what hash tables do.

Hash tables

null	null	null	null	null	null	null	null	null	null

In practice, hash tables are just arrays.

// In Java, this might look like
Object []hashTable = new Object[hashTableSize];

\(hashTable=\)

We need to make an assumption about objects here, we assume they're hashable.

Objects that are hashable override the hashCode method. If you want to hash an object you defined, you have to implement this method.

Hash tables

null	null	null	null	null	null	null	null		null

// In Java, this might look like
Object []hashTable = new Object[hashTableSize];

String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Issac";     // str2.hashCode() % 10 = 1
String str3 = "Yohan";

hashTable[str1.hashCode() % 10] = str1;

\(hashTable=\)

"Katherine" (String Object)

Hash tables

null		null	null	null	null	null	null		null

// In Java, this might look like
Object []hashTable = new Object[hashTableSize];

String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac";     // str2.hashCode() % 10 = 1
String str3 = "Yohan";

hashTable[str1.hashCode() % 10] = str1;
hashTable[str2.hashCode() % 10] = str2;

\(hashTable=\)

"Katherine"

"Isaac"

Why is this useful?

Hash tables

null		null	null	null	null	null	null		null

\(hashTable=\)

"Katherine"

"Isaac"

We use this as a means of creating a mapping between objects. Lookups in this mapping are fast.

We mapped a name to itself, we could also map it to data. An example is a cache.

Hash tables

null		null	null	null	null	null	null		null

\(hashTable=\)

"Katherine.html" + file data

"Isaac.html" + file data

Think of an HTTP server. If we request "Isaac.html":

we the hashCode() of "Isaac.html":
check to see if Isaac.html is in the hash table.
If it is, we return the data in the table
If it isn't we read the file add it to the hash table

This is faster than reading a file from a hard drive

Hash tables

null		null	null	null	null	null	null		null

// In Java, this might look like
Object []hashTable = new Object[hashTableSize];

String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac";     // str2.hashCode() % 10 = 1
String str3 = "Yohan";     // str3.hashCode() % 10 = 1

hashTable[str1.hashCode() % 10] = str1;
hashTable[str2.hashCode() % 10] = str2;
hashTable[str3.hashCode() % 10] = str3;

\(hashTable=\)

"Katherine"

"Isaac"

These occupy the same spot in the table? It's a collision.

"Yohan"?

Hash tables

null		null	null	null	null	null	null		null

\(hashTable=\)

"Katherine"

"Isaac"

We search for the next open spot in the list and place the object there, this is linear probing

"Yohan"?

null			null	null	null	null	null		null

"Katherine"

"Isaac"

"Yohan"

Hash tables

We search for the next open spot in the list and place the object there, this is linear probing

null			null	null	null	null	null		null

"Katherine"

"Isaac"

"Yohan"

For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)

Successful

Unsuccessful

\(\frac{1}{2}\left(1 - \frac{1}{1-\alpha} \right) \)

\(\frac{1}{2}\left(1 - \frac{1}{(1-\alpha)^2} \right) \)

Avg # of probes during a search

See lecture slides for details

Hash tables

This tends to cluster entries in the hash table.

null			?	?	null	null	null		null

"Katherine"

"Isaac"

"Yohan"

For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)

Successful

Unsuccessful

\(\frac{1}{2}\left(1 - \frac{1}{1-\alpha} \right) \)

\(\frac{1}{2}\left(1 - \frac{1}{(1-\alpha)^2} \right) \)

Avg # of probes during a search

Hash tables

Instead of trying to insert into the H(str) + 1, H(str)+2, H(str)+3... We insert quadratically, i.e. H(str) + \(1^2\), H(str)-\(2^2\), H(str)+\(3^2\)...

For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)

Successful

Unsuccessful

\(1 - ln(1-\alpha) - \frac{\alpha}{2}\)

\(\frac{1}{1-\alpha} - \alpha - ln(1-\alpha)\)

Avg # of probes during a search

This is quadratic probing

Notice the - sign, we alternate between +/-

Hash tables

Instead of trying to insert into the \(H\)(str) + 1, \(H\)(str)+2, \(H\)(str)+3... We use a second hash function \(H_2\) and try to insert at \(H\)(str) + \(H_2\)(str), \(H\)(str) + 2\(H_2\)(str), \(H\)(str) + 3\(H_2\)(str)...

For a given load factor \(\alpha = \frac{\text{number occupied cells}}{\text{table size}}\)

Successful

Unsuccessful

\(\frac{1}{\alpha}ln\left(\frac{1}{1-\alpha}\right)\)

\(\frac{1}{1-\alpha} \)

Avg # of probes during a search

This is double hashing

Hash tables

null		null	null	null	null	null	null		null

\(hashTable=\)

"Katherine"

"Isaac"

Don't worry about probing, so keep entries in a linked list

"Yohan"

Hash tables

// In Java, this might look like
ArrayList<LinkedList<Object>> hashTable = 
      new  ArrayList<LinkedList<Object>>(10);

for (int i = 0; i < 10; i++) {
    hashTable.add(new LinkedList<Object>());
}
        
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac";     // str2.hashCode() % 10 = 1
String str3 = "Yohan";     // str3.hashCode() % 10 = 1

hashTable.get(str1.hashCode() % 10).add(str1);
hashTable.get(str2.hashCode() % 10).add(str2);
hashTable.get(str3.hashCode() % 10).add(str3);

This is with a linked list

Hash tables

// In Java, this might look like
ArrayList<ArrayList<Object>> hashTable = 
      new  ArrayList<ArrayList<Object>>(10);

for (int i = 0; i < 10; i++) {
    hashTable.add(new ArrayList<Object>());
}
        
String str1 = "Katherine"; // str1.hashCode() % 10 = 9
String str2 = "Isaac";     // str2.hashCode() % 10 = 1
String str3 = "Yohan";     // str3.hashCode() % 10 = 1

hashTable.get(str1.hashCode() % 10).add(str1);
hashTable.get(str2.hashCode() % 10).add(str2);
hashTable.get(str3.hashCode() % 10).add(str3);

This is with a resizable list

Hash tables

null		null	null	null	null	null	null		null

\(hashTable=\)

"Katherine"

"Isaac"

"Yohan"

If we're searching for something in the hash table, we hash the object, then search the list. This is done in \(\Theta(1+\alpha)\) time. What if this list was sorted? What about deletion?

Heaps

A heap is a binary tree that is perfectly balanced, with the property that a node is greater than all of its children (max heap)

As an array this looks like A=[10,3,5,1,2]

Heapsort

Take the top of the tree off the heap, swap it with the last element, and reheap-ify

As an array we swap with the last element, and reheap-ify on a smaller list.

Heapsort

Take the top of the tree off the heap, swap it with the last element, and reheap-ify

As an array this looks like A=[2,3,5,1,10]

Heapsort

Take the top of the tree off the heap, swap it with the last element, and reheap-ify

As an array this looks like A=[5,2,3,1,10]

Heapsort

Take the top of the tree off the heap, swap it with the last element, and reheap-ify

As an array this looks like A=[1,2,3,5,10]

Heapsort

Keep doing this... (our sorted array grows from n-1 down to 0)

As an array this looks like A=[1,2,3,5,10]