# Dyck language

Lattice of the 14 Dyck words of length 8 - [ and ] interpreted as up and down

In the theory of formal languages of computer science, mathematics, and linguistics, the Dyck language is the language consisting of balanced strings of square brackets [ and ]. It is important in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions. It is named after the mathematician Walther von Dyck.

## Formal definition

Let $\Sigma = \{ [, ] \}$ be the alphabet consisting of the symbols [ and ] and let $\Sigma^{*}$ denote its Kleene closure. For any element $u \in \Sigma^{*}$ with length $| u |$ we define partial functions $insert : \Sigma^{*} \times \mathbb{N} \cup \{ 0 \} \rightarrow \Sigma^{*}$ and $delete : \Sigma^{*} \times \mathbb{N} \rightarrow \Sigma^{*}$ by

$insert(u, j)$ is $u$ with "$[]$" inserted into the $j$th position
$delete(u, j)$ is $u$ with "$[]$" deleted from the $j$th position

with the understanding that $insert(u, j)$ is undefined for $j > |u|$ and $delete(u, j)$ is undefined if $j > |u| - 2$. We define an equivalence relation $R$ on $\Sigma^{*}$ as follows: for elements $a, b \in \Sigma^{*}$ we have $(a, b) \in R$ if and only if there exists a finite sequence of applications of the $insert$ and $delete$ functions starting with $a$ and ending with $b$, where the empty sequence is allowed. That the empty sequence is allowed accounts for the reflexivity of $R$. Symmetry follows from the observation that any finite sequence of applications of $insert$ to a string can be undone with a finite sequence of applications of $delete$. Transitivity is clear from the definition.

The equivalence relation partitions the language $\Sigma^{*}$ into equivalence classes. If we take $\epsilon$ to denote the empty string, then the language corresponding to the equivalence class $\operatorname{Cl}(\epsilon)$ is called the Dyck language.

### Alternative definition

An alternative definition of the Dyck language can be formulated when we introduce the $imbalance : \Sigma^{*} \rightarrow \left(\mathbb{N} \cup \{0\}\right)$ function.

$imbalance ( u ) = |u|_{[} - |u|_{]}$ for any $u \in \Sigma^{*}$.

where $|u|_{[}$ and $|u|_{]}$ are respectively the number of [ and ] in $u$. I.e. $imbalance$ counts the imbalance of [ over ]. If $imbalance(u)$ is positive then $u$ has more [ than ].

Now, the Dyck language can be defined as the language

$\{ u \in \Sigma^{*} \vert imbalance(u) = 0 \text{ and } imbalance(v) \ge 0 \text{ for all prefixes } v \text{ of } u \}$

## Properties

• The Dyck language is closed under the operation of concatenation.
• By treating $\Sigma^{*}$ as an algebraic monoid under concatenation we see that the monoid structure transfers onto the quotient $\Sigma^{*} / R$, resulting in the syntactic monoid of the Dyck language. The class $\operatorname{Cl}(\epsilon)$ will be denoted $1$.
• The syntactic monoid of the Dyck language is not commutative: if $u = \operatorname{Cl}([)$ and $v = \operatorname{Cl}(])$ then $uv = \operatorname{Cl}([]) = 1 \ne \operatorname{Cl}(][) = vu$.
• With the notation above, $uv = 1$ but neither $u$ nor $v$ are invertible in $\Sigma^{*} / R$.
• The syntactic monoid of the Dyck language is isomorphic to the bicyclic semigroup by virtue of the properties of $\operatorname{Cl}([)$ and $\operatorname{Cl}(])$ described above.
• By the Chomsky–Schützenberger representation theorem, any context-free language is a homomorphic image of the intersection of some regular language with a homomorphic preimage of the Dyck language on two brackets.[1]
• The Dyck language with two distinct types of brackets can be recognized in the complexity class $TC^{0}$.[2]

## Examples

We can define an equivalence relation $L$ on the Dyck language $\mathcal{D}$. For $u,v\in\mathcal{D}$ we have $(u,v)\in L$ if and only if $|u| = |v|$, i.e. $u$ and $v$ have the same length. This relation partitions the Dyck language $\mathcal{D} / L = \mathcal{D}_{0} \cup \mathcal{D}_{2} \cup \mathcal{D}_{4} \cup \ldots = \bigcup_{n=0}^{\infty} \mathcal{D}_{n}$ where $\mathcal{D}_{n} = \{ u\in\mathcal{D} \mid |u| = n\}$. Note that $\mathcal{D}_{n}$ is empty for odd $n$. E.g. there are 14 words in $\mathcal{D}_{8}$, i.e. [[[[]]]], [[[][]]], [[[]][]], [[][[]]], [[[]]][], [[][][]], [][[[]]], [[][]][], [[]][[]], [][[][]], [[]][][], [][[]][], [][][[]], [][][][].

Having introduced the Dyck words of length $n$, we can introduce a relationship on them. For every $n \in \mathbb{N}$ we define a relation $S_{n}$ on $\mathcal{D}_{n}$; for $u,v\in\mathcal{D}_{n}$ we have $(u,v)\in S_{n}$ if and only if $v$ can be reached from $u$ by a series of proper swaps. A proper swap in a word $u\in\mathcal{D}_{n}$ swaps an occurrence of '][' with '[]'. For each $n\in\mathbb{N}$ the relation $S_{n}$ makes $\mathcal{D}_{n}$ into a partially ordered set. The relation $S_{n}$ is reflexive because an empty sequence of proper swaps takes $u$ to $u$. Transitivity follows because we can extend a sequence of proper swaps that takes $u$ to $v$ by concatenating it with a sequence of proper swaps that takes $v$ to $w$ forming a sequence that takes $u$ into $w$. To see that $S_{n}$ is also antisymmetric we introduce an auxiliary function $\sigma_{n}:\mathcal{D}_{n}\rightarrow\mathbb{N}:u\mapsto\sum_{vw=u} imbalance(v)$ where $v$ ranges over all prefixes of $u$. The following table illustrates that $\sigma_{n}$ is strictly monotonic with respect to proper swaps.

partial sums of $sigma_{n}(u)$ $u$ $u'$ partial sums of $sigma_{n}(u')$ $P$ $P-1$ $P$ $Q$ $\ldots$ ] [ $\ldots$ $\ldots$ [ ] $\ldots$ $P$ $P+1$ $P$ $Q$ 0 2 0 0

Hence $\sigma_{n}(u') - \sigma_{n}(u) = 2 > 0$ so $\sigma_{n}(u) < \sigma_{n}(u')$ when there is a proper swap that takes $u$ into $u'$. Now if we assume that both $(u,v), (v,u)\in S_{n}$ and $u\ne v$, then there are non-empty sequences of proper swaps such $u$ is taken into $v$ and vice versa. But then $\sigma_{n}(u) < \sigma_{n}(v) < \sigma_{n}(u)$ which is nonsensical. Therefore, whenever both $(u,v)$ and $(v,u)$ are in $S_{n}$, we have $u = v$, hence $S_{n}$ is antisymmetric.

The partial ordered set $D_{8}$ is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.