Lempel-Ziv-Markov chain algorithm

From Wikipedia, the free encyclopedia

Jump to: navigation, search

The Lempel-Ziv-Markov chain-Algorithm (LZMA) is an algorithm used to perform data compression. It has been under development since 1998[1] and is used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to LZ77 and features a high compression ratio (generally higher than bzip2 [2][3]) and a variable compression-dictionary size (up to 4 GB).[4]

Contents

[edit] Overview

The LZMA uses an improved LZ77 compression algorithm, backed by a range encoder.

Streams for data, repeated-sequence size and repeated-sequence location seem to be compressed separately.[citation needed]

[edit] 7-Zip reference implementation

The reference implementation of LZMA is included as part of the 7z and 7-Zip suite of tools. Source code is distributed under the terms of the GNU LGPL license with a special exception for linked binaries. The special exception allows redistribution of binaries linked to unmodified LZMA to be free of any LGPL requirements (e.g., they do not need to allow reverse engineering or binary modifications.)

The reference open source LZMA compression library is written in C++ and has the following properties:

  • Compression speed: approximately 1 MB per second on a 2 GHz CPU
  • Decompression speed: between 10 and 20 MB per second on a 2 GHz CPU
  • Support for multi-threading.

As of version 4.58 (beta) of LZMA SDK, there is also ANSI C reference implementation of LZMA compression and decompression routines available as well.

The 7-Zip implementation uses several variants of hash chains, binary trees and Patricia tries as the basis for its dictionary search algorithm.

Decompression-only code for LZMA generally compiles to around 5kB and the amount of RAM required during decompression is principally determined by the size of the sliding window used during compression. Small code size and relatively low memory overhead, particularly with smaller dictionary lengths, make the LZMA decompression algorithm well-suited to embedded applications.

[edit] Algorithm

Paul Sladen summarizes the algorithm:

LZMA is effectively deflate (zlib, gzip, zip) with a larger dictionary size, 32MB instead of 32kB. LZMA stands for Lempel-Ziv-Markov chain-Algorithm, after string back-references have been located, values are reduced using a Markov chain range-encoder (aka arithmetic coding) instead of huffman.

[5]

[edit] Users

Software that uses or supports LZMA:

[edit] Notes

  1. ^ The SDK history file states that it was in development from 1996, and first used in 7-zip 2001-08-30. Aside from some unreferenced comments about 1998, the algorithm appears to have been unpublished before its use in 7-zip.
  2. ^ Collin, Lasse (2005-05-31). "A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA". The Tukaani Project. Retrieved on 2008-09-02.
  3. ^ Klausmann, Tobias (2008-05-08). "Gzip, Bzip2 and Lzma compared". Blog of an Alpha animal. Retrieved on 2008-09-02.
  4. ^ Overview of the LZMA format 7z Format
  5. ^ Paul Sladen. "Comment on 7-zip, LZMA and blah". Retrieved on 2007-10-06.
  6. ^ "Features/RPM4.6". Red Hat, Inc.. Retrieved on 2008-08-30.

[edit] External links

Personal tools