Jpeg Compression

 

   Home  |   Prev  |   Next 

 

Jpeg Huffman

Just to complete the jpeg compression story we will briefly look at the remaining processes.

After quantization the 63 DCT AC terms are collected using the zigzag method illustrated on the left. This collection method takes advantage of the fact that high frequency terms will tend to zero after quantization improving the chance of getting longer runs of zero which will be idea for good run length compression.The 63 AC components from the DCT process are compressed using the loss less run length encoding

However the DC component is treated differently. It is assumed that neighboring 8 x 8 blocks will have a similar average value so instead of dealing with a large number format it uses a small number format which defines the difference in level from the previous block thereby requiring less code to store the information.

The final stage of the JPEG process is to use the loss-less Huffman compression coding to compress all of the run length compressed DCT terms. As mentioned earlier Huffman is a variable length code based on probabilities. Unlike most other codes which are fixed length block codes. For Huffman to work well there must be values in the data that occur more frequently than others. If the values are completely random then Huffman coding is pointless.

Now to run you through a typical training example that illustrates the benefits of Huffman compression.

Imagine that digital codes are used to transmit various regional weather conditions to shipping.

Let's consider there are 4 states for visibility as outlined in the table below

Visibility Code word using a 2 bit block code
-----------------------------------
Very Poor 00
poor 01
moderate 10
good 11

There are 4 state which would require a 2 bit code

To workout the Huffman code you need to know the probabilities of each state. In our case the probabilities are as follows.

Visibility Probabilities
-----------------------------------
Very Poor 0.1
poor 0.3
moderate 0.5
good 0.1

The most probably events gets the shortest code and the least probably events get the longest code

Visibility Code word
------------------------------------
Very Poor 111
poor 10
moderate 0
good 110

Using a block code it will take 2 bits to transmit the information regardless of the visibility condition.

In the case of Huffman the average code length will be

(3bits x 0.1) + ( 2bits x 0.3) + (1bit x 0.5) + (3bits x 0.1) = 1.7 binary digits per code.

An average saving of 15%

Care is needed in choosing the Huffman codes so that the string of codes are uniquely decodable. We need to know when a code starts and finishes as they are no longer of a fixed length.

for example

110011110 = good,moderate,very poor, poor

In some situations the probabilities of events are know in advance but in the case of image data they have to calculated for each image before coding can commence. In most cases a coding tree or dictionary has to be constructed and included in the file so that compressed Huffman data can be decoded when the image file is opened on the computer.