LZWL

From Wikipedia, the free encyclopedia
Jump to: navigation, search

LZWL is a syllable-based variant of the character-based LZW compression algorithm.[1]

LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.

Syllables[edit]

According to Wiktionary, syllable is defined as:

  1. A unit of human speech that is interpreted by the listener as a single sound, although syllables usually consist of one or more vowel sounds, either alone or combined with the sound of one or more consonants; a word consists of one or more syllables.
  2. The written representation of a given pronounced syllable.

As the decomposition to syllables is used in data compression, it is not necessary to decompose words into syllables always correctly.

Algorithm[edit]

Algorithm LZWL can work with syllables obtained by all algorithms of decomposition into syllables. This algorithm can be used for words too.

In the initialization step the dictionary is filled up with all characters from the alphabet. In each next step it is searched for the maximal string S, which is from the dictionary and matches the prefix of the still non-coded part of the input. The number of phrase S is sent to the output. A new phrase is added to the dictionary. This phrase is created by concatenation of string S and the character that follows S in file. The actual input position is moved forward by the length of S. Decoding has only one situation for solving. We can receive the number of phrase, which is not from the dictionary. In this case we can create that phrase by concatenation of the last added phrase with its first character.

The syllable-based version works over an alphabet of syllables. In the initialization step we add to the dictionary the empty syllable and small syllables from a database of frequent syllables. Finding string S and coding its number is similar to the character-based version, except that string S is a string of syllables. The number of phrase S is encoded to the output. The string S can be the empty syllable.

If S is the empty syllable, then we must get from the file one syllable called K and encode K by methods for coding new syllables. Syllable K is added to the dictionary. The position in the file is moved forward by the length of S. In the case when S is the empty syllable, the input position is moved forward by the length of K.

In adding a phrase to the dictionary there is a difference to the character-based version. The phrase from the next step will be called S1. If S and S1 are both non-empty syllables, then we add a new phrase to the dictionary. The new phrase is created by the concatenation of S1 with the first syllable of S. This solution has two advantages: The first is that strings are not created from syllables that appear only once. The second advantage is that we cannot receive in decoder number of phrase that is not from dictionary.

External links[edit]