The Problem

Now let’s define the encoding problem. You have a set of symbols, each appearing with some known frequency (or probability). You want to assign each symbol a distinct binary string (a codeword) so that:

  • No codeword is a prefix of any other — ensuring instantaneous, unambiguous decoding.
  • The average length of the encoded message (weighted by symbol frequencies) is as small as possible.

 

Input

  • A source alphabet $\Sigma = \lbrace \sigma_1, \sigma_2, …, \sigma_n \rbrace$.
  • A nonnegative frequency $p_\sigma$ for each symbol $\sigma$ of an alphabet $\Sigma$ of size $n \ge 2$.

Output

The prefix-free binary code with minimum possible average encoding length:

$$ \sum_{\sigma \in \Sigma} p_\sigma \cdot \text{(number of bits used to encode } \sigma \text{)} $$

Last updated 11 May 2025, 18:45 +0500 . history