HuffmanCompression
Purpose
The goal of this project was to demonstrate the use of more advanced data structures, such as binary trees, in real world applications. We implemented a Huffman Coding algorithm which generates codes for each character of an input string based on the overall frequency of the character which represents the character's location in a binary string. This method is able to save around 40% space utilization for text files and is still widely used as a form of lossless file compression.
Planning
The main concept behind the Huffman Compression algorithm is to create a binary tree which is structured so that the most commonly used characters appear close to the root. This binary tree will then be used to "encode" each character in a text file as a series of 0's and 1's that represent the location of a character in the binary tree. The benefit of constructing the binary tree based on character frequencies is that the encoded values for common characters such as "e" are optimized to require a shorter series of 0's and 1's. The reason this method works to compress text files is that a text file following the UTF-8 standard will use 8 bits for each character. Huffman compression encodes each character with a series of 0's and 1's which only take one bit of storage each, resulting in compressed files.
Implementation (Main Phases)
HuffmanTree
Building a Binary Tree using the weights obtained previously in GenWeights, most frequent characters are closest to the root
BinaryIO
Processing Input and Output for binary files
Encode & Decode
Exporting a compressed binary file based on Huffman Codes, reading a binary file and converting back to characters
Demonstration
As shown in the graph on the left, Huffman Compression is able to reduce the file size of a .txt file by around 40% on average. This graph also shows how this method of lossless file compression works on a wide variety of styles of writing ranging from The Cat in the Hat to War and Peace.
With this project, we were able to showcase an example use case for a more advanced data structure such as a binary tree. This project showed me the numerous uses of data structures and algorithms that make basic computer functions such as compressing files possible. I've personally had a lot of experience using advanced data structures in competitive programming, but this was the first time I've applied it to a real world application.