# Dyck language

In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of brackets. The set of Dyck words forms a Dyck language. The simplest, D1, uses just two matching brackets, e.g. ( and ).

Dyck words and language are named after the mathematician Walther von Dyck. They have applications in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions.

## Formal definition

Let ${\displaystyle \Sigma =\{[,]\}}$ be the alphabet consisting of the symbols [ and ]. Let ${\displaystyle \Sigma ^{*}}$ denote its Kleene closure. The Dyck language is defined as:

${\displaystyle \{u\in \Sigma ^{*}\vert {\text{ all prefixes of }}u{\text{ contain no more ]'s than ['s}}{\text{ and the number of ['s in }}u{\text{ equals the number of ]'s}}\}.}$

### Context-free grammar

It may be helpful to define the Dyck language via a context-free grammar in some situations. The Dyck language is generated by the context-free grammar with a single non-terminal S, and the production:

Sε | "[" S "]" S

That is, S is either the empty string (ε) or is "[", an element of the Dyck language, the matching "]", and an element of the Dyck language.

An alternative context-free grammar for the Dyck language is given by the production:

S → ("[" S "]")*

That is, S is zero or more occurrences of the combination of "[", an element of the Dyck language, and a matching "]", where multiple elements of the Dyck language on the right side of the production are free to differ from each other.

### Alternative definition

In yet other contexts it may instead be helpful to define the Dyck language by splitting ${\displaystyle \Sigma ^{*}}$ into equivalence classes, as follows. For any element ${\displaystyle u\in \Sigma ^{*}}$ of length ${\displaystyle |u|}$, we define partial functions ${\displaystyle \operatorname {insert} :\Sigma ^{*}\times \mathbb {N} \rightarrow \Sigma ^{*}}$ and ${\displaystyle \operatorname {delete} :\Sigma ^{*}\times \mathbb {N} \rightarrow \Sigma ^{*}}$ by

${\displaystyle \operatorname {insert} (u,j)}$ is ${\displaystyle u}$ with "${\displaystyle []}$" inserted into the ${\displaystyle j}$th position
${\displaystyle \operatorname {delete} (u,j)}$ is ${\displaystyle u}$ with "${\displaystyle []}$" deleted from the ${\displaystyle j}$th position

with the understanding that ${\displaystyle \operatorname {insert} (u,j)}$ is undefined for ${\displaystyle j>|u|}$ and ${\displaystyle \operatorname {delete} (u,j)}$ is undefined if ${\displaystyle j>|u|-2}$. We define an equivalence relation ${\displaystyle R}$ on ${\displaystyle \Sigma ^{*}}$ as follows: for elements ${\displaystyle a,b\in \Sigma ^{*}}$ we have ${\displaystyle (a,b)\in R}$ if and only if there exists a sequence of zero or more applications of the ${\displaystyle \operatorname {insert} }$ and ${\displaystyle \operatorname {delete} }$ functions starting with ${\displaystyle a}$ and ending with ${\displaystyle b}$. That the sequence of zero operations is allowed accounts for the reflexivity of ${\displaystyle R}$. Symmetry follows from the observation that any finite sequence of applications of ${\displaystyle \operatorname {insert} }$ to a string can be undone with a finite sequence of applications of ${\displaystyle \operatorname {delete} }$. Transitivity is clear from the definition.

The equivalence relation partitions the language ${\displaystyle \Sigma ^{*}}$ into equivalence classes. If we take ${\displaystyle \epsilon }$ to denote the empty string, then the language corresponding to the equivalence class ${\displaystyle \operatorname {Cl} (\epsilon )}$ is called the Dyck language.

## Properties

• The Dyck language is closed under the operation of concatenation.
• By treating ${\displaystyle \Sigma ^{*}}$ as an algebraic monoid under concatenation we see that the monoid structure transfers onto the quotient ${\displaystyle \Sigma ^{*}/R}$, resulting in the syntactic monoid of the Dyck language. The class ${\displaystyle \operatorname {Cl} (\epsilon )}$ will be denoted ${\displaystyle 1}$.
• The syntactic monoid of the Dyck language is not commutative: if ${\displaystyle u=\operatorname {Cl} ([)}$ and ${\displaystyle v=\operatorname {Cl} (])}$ then ${\displaystyle uv=\operatorname {Cl} ([])=1\neq \operatorname {Cl} (][)=vu}$.
• With the notation above, ${\displaystyle uv=1}$ but neither ${\displaystyle u}$ nor ${\displaystyle v}$ are invertible in ${\displaystyle \Sigma ^{*}/R}$.
• The syntactic monoid of the Dyck language is isomorphic to the bicyclic semigroup by virtue of the properties of ${\displaystyle \operatorname {Cl} ([)}$ and ${\displaystyle \operatorname {Cl} (])}$ described above.
• By the Chomsky–Schützenberger representation theorem, any context-free language is a homomorphic image of the intersection of some regular language with a Dyck language on one or more kinds of bracket pairs.[1]
• The Dyck language with two distinct types of brackets can be recognized in the complexity class ${\displaystyle TC^{0}}$.[2]
• The number of distinct Dyck words with exactly n pairs of parentheses and k innermost pairs (viz. the substring ${\displaystyle [\ ]}$) is the Narayana number ${\displaystyle \operatorname {N} (n,k)}$.
• The number of distinct Dyck words with exactly n pairs of parentheses is the n-th Catalan number ${\displaystyle C_{n}}$. Notice that the Dyck language of words with n parentheses pairs is equal to the union, over all possible k, of the Dyck languages of words of n parentheses pairs with k innermost pairs, as defined in the previous point. Since k can range from 0 to n, we obtain the following equality, which indeed holds:
${\displaystyle C_{n}=\sum _{k=1}^{n}\operatorname {N} (n,k)}$

## Examples

We can define an equivalence relation ${\displaystyle L}$ on the Dyck language ${\displaystyle {\mathcal {D}}}$. For ${\displaystyle u,v\in {\mathcal {D}}}$ we have ${\displaystyle (u,v)\in L}$ if and only if ${\displaystyle |u|=|v|}$, i.e. ${\displaystyle u}$ and ${\displaystyle v}$ have the same length. This relation partitions the Dyck language: ${\displaystyle {\mathcal {D}}/L=\{{\mathcal {D}}_{0},{\mathcal {D}}_{1},\ldots \}}$. We have ${\displaystyle {\mathcal {D}}={\mathcal {D}}_{0}\cup {\mathcal {D}}_{2}\cup {\mathcal {D}}_{4}\cup \ldots =\bigcup _{n=0}^{\infty }{\mathcal {D}}_{n}}$ where ${\displaystyle {\mathcal {D}}_{n}=\{u\in {\mathcal {D}}\mid |u|=n\}}$. Note that ${\displaystyle {\mathcal {D}}_{n}}$ is empty for odd ${\displaystyle n}$.

Having introduced the Dyck words of length ${\displaystyle n}$, we can introduce a relationship on them. For every ${\displaystyle n\in \mathbb {N} }$ we define a relation ${\displaystyle S_{n}}$ on ${\displaystyle {\mathcal {D}}_{n}}$; for ${\displaystyle u,v\in {\mathcal {D}}_{n}}$ we have ${\displaystyle (u,v)\in S_{n}}$ if and only if ${\displaystyle v}$ can be reached from ${\displaystyle u}$ by a series of proper swaps. A proper swap in a word ${\displaystyle u\in {\mathcal {D}}_{n}}$ swaps an occurrence of '][' with '[]'. For each ${\displaystyle n\in \mathbb {N} }$ the relation ${\displaystyle S_{n}}$ makes ${\displaystyle {\mathcal {D}}_{n}}$ into a partially ordered set. The relation ${\displaystyle S_{n}}$ is reflexive because an empty sequence of proper swaps takes ${\displaystyle u}$ to ${\displaystyle u}$. Transitivity follows because we can extend a sequence of proper swaps that takes ${\displaystyle u}$ to ${\displaystyle v}$ by concatenating it with a sequence of proper swaps that takes ${\displaystyle v}$ to ${\displaystyle w}$ forming a sequence that takes ${\displaystyle u}$ into ${\displaystyle w}$. To see that ${\displaystyle S_{n}}$ is also antisymmetric we introduce an auxiliary function ${\displaystyle \sigma _{n}:{\mathcal {D}}_{n}\rightarrow \mathbb {N} }$ defined as a sum over all prefixes ${\displaystyle v}$ of ${\displaystyle u}$:

${\displaystyle \sigma _{n}(u)=\sum _{vw=u}{\Big (}({\text{count of ['s in }}v)-({\text{count of ]'s in }}v){\Big )}}$

The following table illustrates that ${\displaystyle \sigma _{n}}$ is strictly monotonic with respect to proper swaps.

partial sums of ${\displaystyle \sigma _{n}(u)}$ ${\displaystyle u}$ ${\displaystyle u'}$ partial sums of ${\displaystyle \sigma _{n}(u')}$ ${\displaystyle P}$ ${\displaystyle P-1}$ ${\displaystyle P}$ ${\displaystyle Q}$ ${\displaystyle \ldots }$ ] [ ${\displaystyle \ldots }$ ${\displaystyle \ldots }$ [ ] ${\displaystyle \ldots }$ ${\displaystyle P}$ ${\displaystyle P+1}$ ${\displaystyle P}$ ${\displaystyle Q}$ 0 2 0 0

Hence ${\displaystyle \sigma _{n}(u')-\sigma _{n}(u)=2>0}$ so ${\displaystyle \sigma _{n}(u)<\sigma _{n}(u')}$ when there is a proper swap that takes ${\displaystyle u}$ into ${\displaystyle u'}$. Now if we assume that both ${\displaystyle (u,v),(v,u)\in S_{n}}$ and ${\displaystyle u\neq v}$, then there are non-empty sequences of proper swaps such ${\displaystyle u}$ is taken into ${\displaystyle v}$ and vice versa. But then ${\displaystyle \sigma _{n}(u)<\sigma _{n}(v)<\sigma _{n}(u)}$ which is nonsensical. Therefore, whenever both ${\displaystyle (u,v)}$ and ${\displaystyle (v,u)}$ are in ${\displaystyle S_{n}}$, we have ${\displaystyle u=v}$, hence ${\displaystyle S_{n}}$ is antisymmetric.

The partial ordered set ${\displaystyle D_{8}}$ is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.

## Generalizations

There exist variants of the Dyck language with multiple delimiters, e.g., D2 on the alphabet "(", ")", "[", and "]". The words of such a language are the ones which are well-parenthesized for all delimiters, i.e., one can read the word from left to right, push every opening delimiter on the stack, and whenever we reach a closing delimiter then we must be able to pop the matching opening delimiter from the top of the stack. (The counting algorithm above does not generalise).