Hashes

Purpose

The purpose of this project was to explore the numerous collision resolution techniques that are used to optimize hash based data structures and gain a better understanding of the inner workings of hash data structures. Another major aspect of this project was the introduction of benchmarking code to measure time and memory usage. The data we gathered from benchmarking was used for analysis where we eventually had to decide between time complexity and space complexity in order to choose the best method of handling collisions.

Planning

The main concept of this hash implementation was that we would be using an array (or multiple depending on the algorithm) and generate indexes for new elements based on the size of the array. The goal for this project was to implement Linear Probing, Quadratic Probing, LinkedList, and Cuckoo hashing in order to benchmark the performance of each resolution technique to determine which had the most optimal combination of memory usage and performance. 

Implementation

Linear Probing (LP)

Linear Probing is a collision resolution algorithm that sequentially checks for the next open address. This was implemented using a for loop which continually incremented an index variable until an open memory address was found.

Quadratic Probing (QP)

Quadratic Probing is similar to Linear Probing but instead of increasing an index variable by increments of one, Quadratic Probing searches exponentially higher indexes until it finds an open address.

LinkedList (LL)

An array of Linked Lists can be used as a collision resolution technique which bypasses the issue of two elements having the same hash index by putting both elements in the same Linked List. 

Cuckoo Hashing

Cuckoo Hashing is an algorithm which uses two separate tables which have unique hash functions to exchange elements whenever a collision is found. The idea is that elements will continue to be exchanged until a loop is found and both tables will grow in size.

Analysis

During benchmarking, we measured the performances of every execution with each collision resolution method and plotted this information on graphs. We also analyzed the actual datasets with a total of 19 graphs being used to represent all the data that was collected. What I noticed was that each technique usually had a tradeoff between performance and memory utilization. Cuckoo Hashing, for instance, had extremely fast performance but only used around 10% of the memory allocated to the array even under full load factor. In comparison, Linear Probing ran considerably slower but had nearly 100% memory utilization at full load. 

The main key learning that I got from this project is that when choosing between algorithms to use for a specific task, each algorithm tends to have its own positives and drawbacks. For example, Cuckoo Hashing had by far the fastest performance but came at a memory cost that would be unacceptable in a real world use case. I also learned about the importance of benchmarking and closely analyzing performance using tools such as Google Sheets and Excel to visualize data so that I can make more informed decisions when choosing algorithms. This is something that rarely comes up in hobbyist programming but is mandatory in real world applications which need to be optimized for speed and efficiency.