Lexicographical order

From Wikipedia, the free encyclopedia
Jump to: navigation, search
For similarly named ordering systems outside mathematics, see Alphabetical order and Collation.
Orderings of the 3-subsets of \scriptstyle \{1,...,6\} (and the corresponding binary vectors)
When the (blue) triples are in lex order the (red) vectors are in revlex order, and vice versa. The arrangements on the right side show colex and revcolex order.

In mathematics, the lexicographic or lexicographical order (also known as lexical order, dictionary order, alphabetical order or lexicographic(al) product) is a generalization of the way the alphabetical order of words is based on the alphabetical order of their component letters.

Definition[edit]

Given two partially ordered sets A and B, the lexicographical order on the Cartesian product A × B is defined as

(a,b) ≤ (a′,b′) if and only if a < a′ or (a = a′ and bb′).

The result is a partial order. If A and B are each totally ordered, then the result is a total order as well.

More generally, one can define the lexicographic order on the Cartesian product of n ordered sets, on the Cartesian product of a countably infinite family of ordered sets, and on the union of such sets.

Motivation and uses[edit]

The name of the lexicographic order comes from its generalizing the order given to words in a dictionary or encyclopedia: a sequence of letters (that is, a word)

a1a2 ... ak

appears in a dictionary before a sequence

b1b2 ... bk

if and only if at the first i where ai and bi differ, ai comes before bi in the alphabet.

That comparison assumes both sequences are the same length. To ensure they are the same length, the shorter sequence is usually padded at the end with enough "blanks" (a special symbol that is treated as coming before any other symbol). This also allows ordering of phrases. For the purpose of dictionaries, etc., padding with blank spaces is always done. See alphabetical order.

For example, the word "Thomas" appears before "Thompson" in dictionaries because the letter 'a' comes before the letter 'p' in the alphabet. The 5th letter is the first that is different in the two words; the first 4 letters are "Thom" in both. Because it is the first difference, the 5th letter is the most significant difference (for an alphabetical ordering).

A lexicographical ordering may not coincide with conventional alphabetical ordering. For example, the numerical order of Unicode codepoints does not always correspond to traditional alphabetic orderings of the characters, which vary from language to language. So the lexicographic ordering induced by codepoint value sorts strings in an unambiguous canonical order, but it does not necessarily "alphabetize" them in the conventional sense.

An important property of the lexicographical order is that it preserves well-orders, that is, if A and B are well-ordered sets, then the product set A × B with the lexicographical order is also well-ordered.

An important exploitation of lexicographical ordering is expressed in the ISO 8601 date formatting scheme, which expresses a date as YYYY-MM-DD. This date ordering lends itself to straightforward computerized sorting of dates such that the sorting algorithm does not need to treat the numeric parts of the date string any differently from a string of non-numeric characters, and the dates will be sorted into chronological order. Note, however, that for this to work, there must always be four digits for the year, two for the month, and two for the day, so for example single-digit days must be padded with a zero yielding '01', '02', ..., '09'.

Another generalization of lexical ordering occurs in social choice theory (the theory of elections). Consider an election in which there are 4 candidates A, B, C and D, each voter expresses a top-to-bottom ordering of the candidates, and the voters' orderings are as follows:

18% 17% 33% 32%
A B C D
B A D B
C C A A
D D B C

The MinMax voting method is a simple Condorcet method that counts the votes as in a round-robin tournament (all possible pairings of candidates) and judges each candidate according to its largest "pairwise" defeat. The winner is the candidate whose largest defeat is the smallest. In the example:

  • The largest defeat of A is by D: 65% (33%+32%) rank D over A.
  • The largest defeat of B is by D: 65% (33%+32%) rank D over B.
  • The largest defeat of C is by A (or B): 67% (18%+17%+32%) rank A over C (and B over C).
  • The largest defeat of D is by C: 68% (18%+17%+33%) rank C over D.

MinMax declares a tie between A and B since the largest defeats for both are the same size, 65%. This is like saying "Thomas" and "Thompson" should be at the same position because they have the same first letter. However, if the defeats are compared lexically, we have the MinLexMax method. With MinLexMax, because the largest defeats of A and B are the same size, their next largest defeats are then compared:

  • A's next largest defeat is by B: 49%. (17%+32%) rank B over A.
  • B's next largest defeat is by A: 51% (18%+33%) rank A over B.

Since B's next largest defeat is larger than A's, MinLexMax elects A, which makes more sense than the MinMax tie since a majority rank A over B.

Another usage in social choice theory is the Ranked Pairs voting method. Although usually defined by a procedure that constructs the order of finish, Ranked Pairs is equivalent to finding which of all possible orders of finish is best according to a minlexmax comparison of the majorities they reverse. In the example above, the Ranked Pairs order of finish is ABCD (which elects A). ABCD affirms the majorities who rank A over B, A over C, B over C and C over D, and reverses the majorities who rank D over A and D over B. The largest majority that ABCD reverses is 65%. The only other ordering that wouldn't reverse a larger majority is BACD (which also reverses 65%). ABCD is a better order of finish than BACD because the lexically relevant set of majorities—the majorities on which ABCD and BACD disagree—is {A over B} and BACD reverses the largest majority in this set.

Case of multiple products[edit]

Suppose


  \{ A_1, A_2, \cdots, A_n \}

is an n-tuple of sets, with respective total orderings


  \{ <_1, <_2, \cdots, <_n \}

The dictionary ordering


\ \ <^{d}

of


  A_1 \times A_2 \times \cdots \times A_n

is then


  (a_1, a_2, \dots, a_n) <^d (b_1,b_2, \dots, b_n) \iff
    (\exists\ m > 0) \  (\forall\ i < m) (a_i = b_i) \land (a_m <_m b_m)

That is, if one of the terms


 \ \   a_m <_m b_m

and all the preceding terms are equal.

Informally,


 \ \  a_1

represents the first letter,


 \ \  a_2

the second and so on when looking up a word in a dictionary, hence the name.

This could be more elegantly stated by recursively defining the ordering of any set


 \ \   C= A_j \times A_{j+1} \times \cdots \times A_k

represented by


 \ \   <^d (C)

This will satisfy


  a <^d (A_i) a'  \iff (a <_i a')

  (a,b) <^d (A_i \times B) (a',b') \iff
    a <^d (A_i) a' \lor ( a=a' \  \land \ b <^d (B) b')

where 
  B = A_{i+1} \times A_{i+2} \times \cdots \times A_n.

To put it more simply, compare the first terms. If they are equal, compare the second terms – and so on. The relationship between the first corresponding terms that are not equal determines the relationship between the entire elements.

Groups and vector spaces[edit]

If the component sets are ordered groups then the result is a non-Archimedean group, because e.g. n(0,1) < (1,0) for all n.

If the component sets are ordered vector spaces over R (in particular just R), then the result is also an ordered vector space.

Ordering of sequences of various lengths[edit]

Given a partially ordered set A, the above considerations allow to define naturally a lexicographical partial order <^\mathrm{d} over the free monoid A* formed by the set of all finite sequences of elements in A, with sequence concatenation as the monoid operation, as follows:

u <^\mathrm{d} v if
  • u is a prefix of v, or
  • u=wau' and v=wbv', where w is the longest common prefix of u and v, a and b are members of A such that a<b, and u' and v' are members of A*.

If < is a total order on A, then so is the lexicographic order <d on A*. If A is a finite and totally ordered alphabet, A* is the set of all words over A, and we retrieve the notion of dictionary ordering used in lexicography that gave its name to the lexicographic orderings. However, in general this is not a well-order, even though it is on the alphabet A; for instance, if A = {a, b}, the language {anb | n ≥ 0} has no least element: ... <d aab <d ab <d b. A well-order for strings, based on the lexicographical order, is the shortlex order.

Similarly it is also possible to compare a finite and an infinite string, or two infinite strings.

Comparing strings of different lengths can also be modeled as comparing strings of infinite length by right-padding finite strings with a special value that is less than any element of the alphabet.

This ordering is the ordering usually used to order character strings, including in dictionaries and indexes.

Quasi-lexicographic order[edit]

The quasi-lexicographic order on the free monoid A over an ordered alphabet A orders strings firstly by length, so that the empty string comes first, and then within strings of fixed length n, by lexicographic order on An.[1]

Generalization[edit]

Consider the set of functions f from a well-ordered set X to a totally ordered set Y. For two such functions f and g, the order is determined by the values for the smallest x such that f(x) ≠ g(x).

If Y is also well-ordered and X is finite, then the resulting order is a well-order. As already shown above, if X is infinite this is in general not the case.

If X is infinite and Y has more than one element, then the resulting set YX is not a countable set, see also cardinal exponentiation.

Alternatively, consider the functions f from an inversely well-ordered X to a well-ordered Y with minimum 0, restricted to those that are non-zero at only a finite subset of X. The result is well-ordered. Correspondingly we can also consider a well-ordered X and apply lexicographical order where a higher x is a more significant position. This corresponds to exponentiation of ordinal numbers YX. If X and Y are countable then the resulting set is also countable.

Monomials[edit]

In algebra it is traditional to order terms in a polynomial, by ordering the monomials in the indeterminates. Such matters are typically left implicit in discussion between humans, but must of course be dealt with exactly in computer algebra, for example for testing the equality of polynomials.

More specifically, the definition of Gröbner bases and their computation are heavily based on the choice of an ordering of the monomials. To define such an ordering, one identifies every monomial (for example x_1 x_2^3 x_4 x_5^2) with its vector of exponents (here [1,3,0,1,2]), and one chooses an ordering on these vectors of integers. This ordering must satisfy some further conditions to be admissible for Gröbner bases; see monomial order for details and the admissibility conditions.

One of these admissible orders is the lexicographical order. Another one is the total degree order, which consists in comparing first the total degrees, and then resolving the conflicts by using the lexicographical order. More generally, every admissible order may be defined as the lexicographical order on the values of a set of n linear forms with real coefficients applied to the vector of exponents (here n is the number of variables).[2]

Decimal fractions[edit]

For decimal fractions from the decimal point, a < b applies equivalently for the numerical order and the lexicographic order on the digital representations, provided that strings with a recurring decimal 9 like .399999... and strings with trailing zeros are omitted. With these restrictions, there is an order-preserving bijection between the numbers and the strings.

Colexicographic order[edit]

Orderings of the 24 permutations of \scriptstyle \{1,...,5\} \, that have a complete 5-cycle
The inversion vectors of permutations in colex order are in revcolex order, and vice versa.

The colexicographic or colex order is a natural order structure of the Cartesian product of two or more ordered sets. Colexicographical ordering is used in the Kruskal-Katona theorem.

Given two partially ordered sets A and B, the colexicographical order on the Cartesian product A × B is defined as

(a,b) ≤ (a′,b′) if and only if b < b′ or (b = b′ and aa′ ).

The result is a partial order. If A and B are totally ordered, then the result is a total order also.

More generally, one can define the colexicographic order on the Cartesian product of n ordered sets.

Suppose


  \{ A_1, A_2, \ldots, A_n \}

is an n-tuple of sets, with respective total orderings


  \{ <_1, <_2, \ldots, <_n \}

The colex ordering


\ \ <^\text{colex}

of


  A_1 \times A_2 \times \cdots \times A_n

is then


  (a_1, a_2, \dots, a_n) <^\text{colex} (b_1,b_2, \dots, b_n) \iff  (\exists\ m > 0) \  (\forall\ i > m) (a_i = b_i) \land (a_m <_m b_m)

The following is an ordering on the 3-element subsets of \{1,2,3,4,5,6\}, based on the colex ordering of the triples obtained by writing the elements of each subset in ascending order:

123 < 124 < 134 < 234 < 125 < 135 < 235 < 145 < 245 < 345 <
126 < 136 < 236 < 146 < 246 < 346 < 156 < 256 < 356 < 456

Reverse lexicographic order[edit]

In a common variation of lexicographic order, one compares elements by reading from the right instead of from the left, i.e., the right-most component is the most significant, e.g. applied in a rhyming dictionary.

In the case of monomials one may sort the exponents downward, with the exponent of the first base variable as primary sort key, e.g.:

 x^2 y z^2 < x y^3 z^2 .

Alternatively, sorting may be done by the sum of the exponents, downward.

See also[edit]

References[edit]

  1. ^ Calude, Cristian (1994). Information and randomness. An algorithmic perspective. EATCS Monographs on Theoretical Computer Science. Springer-Verlag. p. 1. ISBN 3-540-57456-5. Zbl 0922.68073. 
  2. ^ Weispfenning, Volker (May 1987), Admissible Orders and Linear Forms, SIGSAM Bulletin (New York, NY, USA: ACM) 21 (2): 16–18, doi:10.1145/24554.24557 .