Jpeg Compression

 

   Home  |   Prev  |   Next 

 

Loss Less Compression

The first data compression methods devised were loss-less. That is after compression and decompression you get back the original data.

These methods relied on the data being inefficiently coded in the first place to get good compression ratios. With a few exceptions as a general rule of thumb if the compression ratio is greater than 10:1  a loss-less method was not used. To improve compression ratios further lossy methods were developed. That is after compression and decompression you get back something that looks like the original data. For some applications it is important that the original data is restored in which case a loss-less method must be used.

For background information let's take a look at some of the simpler loss-less data compression ideas.

They basically take advantage of waste, unused values or inefficiently encoded data.

A simple example is English text with each character stored in an 8-bit byte. Data stored in a byte can hold 1 of 256 values. A typical text message is made up of 52 upper and lower case characters, 10 numbers, a dozen punctuation marks and a few control codes. Over half of the possible positions (values) are never used. In this case a 7-bit code (128 values) would be more efficient. Using a 7-bit codes instead of a  8-bit bytes would reduce the file size by ~12.5%. Not a very useful method for graphic images but illustrates a point.

Another idea is called Run Length Encoding. This takes advantage of long streams of identical data.

AAAAAAAHHCCCCCYYYYYYRRRRRRRRRR

could be re-encoded as  7A2H5C6Y10R   The number indicates the run length followed by the value of the run.

 In the case of binary data in a 1-bit monochrome image ( Fax machine )

00000011100001111111101

could be re-encoded as 634811 to indicate how many zeros and ones follow each other.

Long streams of the same information it is telling us something we already know and therefore redundant. This information can be discarded.

There is a big disadvantage with the run length method when the data changes very rapidly

AHCYR

would be re-encoded as 1A1H1C1Y1R which is in fact larger than the original.

Rapidly changing data contains more information which needs to be retained by using more code.