Data Structures

Tree Node

Each node of the Huffman tree holds:

  • Leaves carry a concrete symbol and its raw frequency.
  • Internal nodes have symbol == '\0'`` and freq = left->freq + right->freq`.

 

Priority Queue Ordering

 

Building the Huffman Tree

  • Step 1 collects frequencies in a hash map.
  • Step 2 seeds the priority queue with leaf nodes.
  • Step 3 repeatedly extracts two smallest‐weight nodes, merges them under a new internal node, and reinserts—ultimately yielding a single root.

 

Generating the Codebook

With the completed tree, a depth-first traversal assigns bit-strings:

  • Leaf: store the accumulated bit-string.
  • Internal: recurse, appending to the prefix.

 

Putting It All Together

The output should look:

  Codewords:
'A': 0
'B': 110
'C': 101
'D': 111
'E': 100
  

 

Time Complexity

The overall running time of our Huffman-coding pipeline breaks down into a few conceptual stages:

Frequency Counting

Scanning the input of length $N$ once to tally how often each symbol appears takes linear time, $O(N)$.

Heap Construction

We insert each of the $n$ distinct symbols into a min-heap (priority queue) based on its frequency. Building the heap in $n$ insertions costs $O(n \log n)$.

Tree Merging

We perform exactly $n−1$ merge steps, and each step removes two nodes and reinserts their parent — three heap operations altogether. Since each such operation is $O(\log n)$, this phase is also $O(n \log n)$.

Code Extraction

Once the tree is complete, a simple traversal visits every node (about $2n−1$ of them) in linear time, $O(n)$.

Overall Time Complexity

Putting these together gives

$$ O(N)+O(n \log n)+O(n \log n)+O(n)=O(N+n \log n) $$

  • If the symbol alphabet is bounded (for example ASCII or Unicode blocks), nn is effectively constant and the entire algorithm runs in linear time $O(N)$.
  • In the extreme case where every input character is unique ($n=N$), it becomes $O(N \log ⁡N)$.

Thus, Huffman coding scales almost linearly in practice and remains efficient even for large inputs with moderately sized alphabets.

 

Code on GitHub

See the implementation code for Huffman’s algorithm.

Last updated 09 May 2025, 23:43 +0500 . history