Dyck language

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Lattice of the 14 Dyck words of length 8 - [ and ] interpreted as up and down

In the theory of formal languages of computer science, mathematics, and linguistics, the Dyck language is the language consisting of balanced strings of square brackets [ and ]. It is important in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions. It is named after the mathematician Walther von Dyck.

Formal definition[edit]

Let \Sigma = \{ [, ] \} be the alphabet consisting of the symbols [ and ] and let \Sigma^{*} denote its Kleene closure. For any element u \in \Sigma^{*} with length | u | we define partial functions insert : \Sigma^{*} \times \mathbb{N} \cup \{ 0 \} \rightarrow \Sigma^{*} and delete : \Sigma^{*} \times \mathbb{N} \rightarrow \Sigma^{*} by

insert(u, j) is u with "[]" inserted into the jth position
delete(u, j) is u with "[]" deleted from the jth position

with the understanding that insert(u, j) is undefined for j > |u| and delete(u, j) is undefined if j > |u| - 2. We define an equivalence relation R on \Sigma^{*} as follows: for elements a, b \in \Sigma^{*} we have (a, b) \in R if and only if there exists a finite sequence of applications of the insert and delete functions starting with a and ending with b, where the empty sequence is allowed. That the empty sequence is allowed accounts for the reflexivity of R. Symmetry follows from the observation that any finite sequence of applications of insert to a string can be undone with a finite sequence of applications of delete. Transitivity is clear from the definition.

The equivalence relation partitions the language \Sigma^{*} into equivalence classes. If we take \epsilon to denote the empty string, then the language corresponding to the equivalence class \operatorname{Cl}(\epsilon) is called the Dyck language.

Alternative definition[edit]

An alternative definition of the Dyck language can be formulated when we introduce the imbalance : \Sigma^{*} \rightarrow \left(\mathbb{N} \cup \{0\}\right) function.

imbalance ( u ) = |u|_{[} - |u|_{]} for any u \in \Sigma^{*}.

where |u|_{[} and |u|_{]} are respectively the number of [ and ] in u. I.e. imbalance counts the imbalance of [ over ]. If imbalance(u) is positive then u has more [ than ].

Now, the Dyck language can be defined as the language

\{ u \in \Sigma^{*} \vert imbalance(u) = 0 \text{ and } imbalance(v) \ge 0 \text{ for all prefixes } v \text{ of } u \}



We can define an equivalence relation L on the Dyck language \mathcal{D}. For u,v\in\mathcal{D} we have (u,v)\in L if and only if |u| = |v|, i.e. u and v have the same length. This relation partitions the Dyck language \mathcal{D} / L = \mathcal{D}_{0} \cup \mathcal{D}_{2} \cup \mathcal{D}_{4} \cup \ldots = \bigcup_{n=0}^{\infty} \mathcal{D}_{n} where \mathcal{D}_{n} = \{ u\in\mathcal{D} \mid |u| = n\}. Note that \mathcal{D}_{n} is empty for odd n. E.g. there are 14 words in \mathcal{D}_{8}, i.e. [[[[]]]], [[[][]]], [[[]][]], [[][[]]], [[[]]][], [[][][]], [][[[]]], [[][]][], [[]][[]], [][[][]], [[]][][], [][[]][], [][][[]], [][][][].

Having introduced the Dyck words of length n, we can introduce a relationship on them. For every n \in \mathbb{N} we define a relation S_{n} on \mathcal{D}_{n}; for u,v\in\mathcal{D}_{n} we have (u,v)\in S_{n} if and only if v can be reached from u by a series of proper swaps. A proper swap in a word u\in\mathcal{D}_{n} swaps an occurrence of '][' with '[]'. For each n\in\mathbb{N} the relation S_{n} makes \mathcal{D}_{n} into a partially ordered set. The relation S_{n} is reflexive because an empty sequence of proper swaps takes u to u. Transitivity follows because we can extend a sequence of proper swaps that takes u to v by concatenating it with a sequence of proper swaps that takes v to w forming a sequence that takes u into w. To see that S_{n} is also antisymmetric we introduce an auxiliary function \sigma_{n}:\mathcal{D}_{n}\rightarrow\mathbb{N}:u\mapsto\sum_{vw=u} imbalance(v) where v ranges over all prefixes of u. The following table illustrates that \sigma_{n} is strictly monotonic with respect to proper swaps.

Strict monotonicity of sigma_{n}
partial sums of sigma_{n}(u) P P-1 P Q
u \ldots ] [ \ldots
u' \ldots [ ] \ldots
partial sums of sigma_{n}(u') P P+1 P Q
Difference of partial sums 0 2 0 0

Hence \sigma_{n}(u') - \sigma_{n}(u) = 2 > 0 so \sigma_{n}(u) < \sigma_{n}(u') when there is a proper swap that takes u into u'. Now if we assume that both (u,v), (v,u)\in S_{n} and u\ne v, then there are non-empty sequences of proper swaps such u is taken into v and vice versa. But then \sigma_{n}(u) < \sigma_{n}(v) < \sigma_{n}(u) which is nonsensical. Therefore, whenever both (u,v) and (v,u) are in S_{n}, we have u = v, hence S_{n} is antisymmetric.

The partial ordered set D_{8} is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.

See also[edit]


  1. ^ Kambites, Communications in Algebra Volume 37 Issue 1 (2009) 193-208
  2. ^ Barrington and Corbett, Information Processing Letters 32 (1989) 251-256