Imc12 03 Huffman Adaptive Lec

Embed Size (px)

Citation preview

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    1/19

    Huffman Coding

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    2/19

    Huffman Codes

    Optimum prefix code developed by D. Huffman in a

    class assignment

    Construction of Huffman codes is based on two ideas:

    In an optimum code, symbols with higher probability should

    have shorter codewordsIn an optimum prefix code, the two symbols that occur least

    frequently will have the same length (otherwise, the

    truncation of the longer codeword to the same length still

    produce a decodable code)

    2/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    3/19

    Principle of Huffman Codes

    Starting with two least probable symbols, and , ofan alphabet A, if the codeword foris [m]0, thecodeword for would be [m]1, where [m] is a string of1s and 0s.

    Now, the two symbols can be combined into a group,which represents a new symbol in the alphabet set.The symbol has the probability P() + P().

    Recursively determine the bit pattern [m] using thenew alphabet set.

    3/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    4/19

    Example: Huffman Code

    Let A = {a1, , a5}, P(ai) = {0.2, 0.4, 0.2, 0.1, 0.1}.

    Symbol Step 1 Step 2 Step 3 Step 4 Codeword

    a 2 0.4 0.4 0.4 0.6 0 1

    a 0.20.2 0.4 0.4 1 01

    1 0a 3 0.2 0.2 0 0.2 1 000

    a 4 0.1 0 0.2 1 0010

    a 5 0.1 1 0011

    Combine last two symbols with lowest probabilities, anduse one bit (last bit in codeword) to differentiate between them!

    4/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    5/19

    Efficiency of Huffman Codes

    Redundancy - the difference between the entropy

    and the average length of a code

    The average codeword length for this code is

    l = 0.4 x 1 + 0.2 x 2 + 0.2 x 3 + 0.1 x 4 + 0.1 x 4 = 2.2 bits/symbol. The

    redundancy is 0.078 bits/symbol.

    For Huffman code, the redundancy is zero when theprobabilities are negative powers of two.

    5/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    6/19

    Minimum Variance Huffman Codes

    When more than two symbols in a Huffman tree

    have the same probability, different merge orders

    produce different Huffman codes.

    Symbol Step 1 Step 2 Step 3 Step 4 Codeword

    a 2 0.4 0.4 0.4 0.6 0 00 The average codeword

    a 1 0.2

    a 3 0.2

    a 4 0.1 0

    0.2 0.4 0 0.4 1 10

    10.2 0 0.2 110.2 1 010

    length is still

    2.2 bits/symbol.

    But variances are different!

    a 5 0.1 1 011

    Two code trees with same symbol probabilities:

    We prefer a code with

    smaller length-variance,

    Why?

    6/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    7/19

    Length of Huffman Codes (1/2)

    Given a sequence of positive integers {l1, l2, , lk}

    satisfiesk

    i=1

    2li 1,

    there exists a uniquely decodable code whosecodeword lengths are given by {l1, l2, , lk}.

    The optimal code for a source S has an average code

    length lavg with the following bounds:H(S) l avg

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    8/19

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    9/19

    Non-binary Huffman Codes

    Huffman codes can be applied to n-ary code space.

    For example, codewords composed of {0, 1, 2}, we

    have ternary Huffman code

    Let A = {a1, , a5}, P(ai) = {0.25, 0.25, 0.2, 0.15, 0.15}.

    Symbol Step 1 Step 2 Codeword

    a 1 0.25 0.5 0 1

    a 2 0.25 0.25 1 2

    a 3 0.20 0 0.25 2 00

    a 4 0.151

    01a 5 0.15 2 02

    12/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    10/19

    Adaptive Huffman Coding

    Huffman codes require exact probability model of the

    source to compute optimal codewords. For messages

    with unknown duration, this is not possible.

    One can try to re-compute the probability model for

    every received symbol, and re-generate new set ofcodewords based on new model for the next symbol

    from scratch too expensive!

    Adaptive Huffman coding tries to achieve this goal atlower cost.

    13/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    11/19

    Adaptive Huffman Coding Tree

    Adaptive Huffman coding maintains a dynamic code

    tree. The tree will be updated synchronously on both

    transmitter-side and receiver-side. If the alphabet

    size is m, the total number of nodes 2m - 1.

    51 = 2m - 1,m = alphabet size

    51

    Weight of a node:

    number of occurrences ofthe symbol, or all the

    symbols in the subtree

    All symbols Not Yet

    Transmitted (NYT)

    49 50

    4847

    45 46

    43 44

    Node number:

    unique ID of each node,

    sorted decreasingly from

    top-to-down and left-to-right

    14/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    12/19

    Initial Codewords

    Before transmission of any symbols, all symbols in

    the source alphabet {a1, a2, , am} belongs to theNYT list.

    Each symbol in the alphabet has an initial codeword using

    eitherlog2m orlog2m+1 bits fixed-length binary code.

    When a symbol ai is transmitted for the first time, thecode for NYT is transmitted, followed by the fixed

    code forai. A new node is created forai and ai istaken out of the NYT list.

    From this point on, we follow the update procedure tomaintain the Huffman code tree.

    15/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    13/19

    Update Procedure

    The set of nodes

    with the same weight

    16/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    14/19

    Encoding Procedure

    17/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    15/19

    Decoding Procedure

    1

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    16/19

    Applications: Image Compression

    Direct application of Huffman coding on image data

    has limited compression ratio

    no model prediction

    with model prediction

    26/26

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    17/19

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    18/19

  • 7/28/2019 Imc12 03 Huffman Adaptive Lec

    19/19