# Statistical Lempel–Ziv

Statistical Lempel–Ziv is a concept of lossless data compression technique published by Dr. Sam Kwong and Yu Fan Ho in 2001.[1] It may be viewed as a variant of the Lempel–Ziv (LZ) based method. The contribution of this concept is to include the statistical properties of the source information while most of the LZ-based compression methods, such as LZ78 and LZW do not take this property into consideration.

## History

The concept of statistical Lempel–Ziv was first proposed by Yu Fan Ho in 2000 as the research topic of the Master degree in the Department of Computer Science of the City University of Hong Kong. Dr. Sam Kwong was Ho's supervisor in this research topic.

In Feb, 2001, the paper on the title "A Statistical Lempel–Ziv compression algorithm for personal digital assistant (PDA)" was published in the IEEE Transactions on Consumer Electronics.

In 2004, Ho successfully applied statistical Lempel–Ziv to a compression algorithm specific for polyphonic melody data. It was useful for the popular mobile phone or handhelds as polyphonic ring tones. Ho proved that the compression ratio, decompression speed and memory consumption outperformed the commonly used lossless compressors such as LZ77, zip, etc., although the compression speed is lower. Fortunately, the compression speed is not essential because compression of the ring tones for handheld devices were preprocessed in factory and not in the devices.

In March 2009, the application of statistical Lempel–Ziv on melody data was granted a patent by the USPTO with the United States Patent number 7,507,897.[2]

## Background

Traditional LZ-based technologies make use of the repetitive characteristic of the data. The decompression process can be done simply by copying the repeated data from the search window according to an index in the compressed data. The data not found in the window is left uncompressed in the compressed data. The uncompressed data is then shifted into the search window for the next repetition and so on. The data is shifted into the window unconditionally without considering the statistical information. Because of limited size of the search window, the first-in data is shifted out unconditionally when the window is full. There are high possibilities that the window is occupied by the useless (non-repetitive) data while the useful (to be repeated) data is banished. To improve the compression ratio, larger search window should be used and hence more memory required in the decompressor.

Statistical Lempel–Ziv is a LZ-like lossless compression algorithm but statistical information is also taken into consideration to identify the useful data that should be put into the dictionary (search window). It improves the compression ratio compared with LZ77 because more useful data can be kept in the dictionary. The dictionary can be smaller in size for keeping the useful data and hence less memory required in the decompressor. Since not all the data has to be shifted into the window, less processing power is required on the decompressor.

## References

1. ^ Kwong, S. & Ho, Y.F., "A statistical Lempel–Ziv compression algorithm for personal digital assistant (PDA)", IEEE Transactions on Consumer Electronics, Vol. 47, Issue 1, pp. 154-162, Feb 2001
2. ^ Dictionary-based compression of melody data and compressor/decompressor for the same, U.S. patent: 7,507,897