# Pumping lemma for context-free languages

In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel[clarification needed] lemma, is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages. The pumping lemma does not suffice to guarantee that a language is context-free; there are other necessary conditions, such as Ogden's lemma, or the Interchange lemma.

## Formal statement

Proof idea: If ${\displaystyle s}$ is sufficiently long, its derivation tree w.r.t. a Chomsky normal form grammar must contain some nonterminal ${\displaystyle N}$ twice on some tree path (upper picture). Repeating ${\displaystyle n}$ times the derivation part ${\displaystyle N}$ ⇒...⇒ ${\displaystyle vNx}$ obtains a derivation for ${\displaystyle uv^{n}wx^{n}y}$ (lower left and right picture for ${\displaystyle n=0}$ and ${\displaystyle 2}$, respectively).

If a language ${\displaystyle L}$ is context-free, then there exists some integer ${\displaystyle p\geq 1}$ (called a "pumping length"[1]) such that every string ${\displaystyle s}$ in ${\displaystyle L}$ that has a length of ${\displaystyle p}$ or more symbols (i.e. with ${\displaystyle |s|\geq p}$) can be written as

${\displaystyle s=uvwxy}$

with substrings ${\displaystyle u,v,w,x}$ and ${\displaystyle y}$, such that

1. ${\displaystyle |vx|\geq 1}$,
2. ${\displaystyle |vwx|\leq p}$, and
3. ${\displaystyle uv^{n}wx^{n}y\in L}$ for all ${\displaystyle n\geq 0}$.

Below is a formal expression of the Pumping Lemma.

${\displaystyle {\begin{array}{l}(\forall L\subseteq \Sigma ^{*})\\\quad ({\mbox{context free}}(L)\Rightarrow \\\quad ((\exists p\geq 1)((\forall s\in L)((|s|\geq p)\Rightarrow \\\quad ((\exists u,v,w,x,y\in \Sigma ^{*})(s=uvwxy\land |vx|\geq 1\land |vwx|\leq p\land (\forall n\geq 0)(uv^{n}wx^{n}y\in L)))))))\end{array}}}$

## Informal statement and explanation

The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.

The property is a property of all strings in the language that are of length at least ${\displaystyle p}$, where ${\displaystyle p}$ is a constant—called the pumping length—that varies between context-free languages.

Say ${\displaystyle s}$ is a string of length at least ${\displaystyle p}$ that is in the language.

The pumping lemma states that ${\displaystyle s}$ can be split into five substrings, ${\displaystyle s=uvwxy}$, where ${\displaystyle vx}$ is non-empty and the length of ${\displaystyle vwx}$ is at most ${\displaystyle p}$, such that repeating ${\displaystyle v}$ and ${\displaystyle x}$ the same number of times (${\displaystyle n}$) in ${\displaystyle s}$ produces a string that is still in the language. It is often useful to repeat zero times, which removes ${\displaystyle v}$ and ${\displaystyle x}$ from the string. This process of "pumping up" additional copies of ${\displaystyle v}$ and ${\displaystyle x}$ is what gives the pumping lemma its name.

Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having ${\displaystyle p}$ equal to the maximum string length in ${\displaystyle L}$ plus one. As there are no strings of this length the pumping lemma is not violated.

## Usage of the lemma

The pumping lemma is often used to prove that a given language L is non-context-free, by showing that arbitrarily long strings s are in L that cannot be "pumped" without producing strings outside L.

For example, the language ${\displaystyle L=\{a^{n}b^{n}c^{n}|n>0\}}$ can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that L is context free. By the pumping lemma, there exists an integer p which is the pumping length of language L. Consider the string ${\displaystyle s=a^{p}b^{p}c^{p}}$ in L. The pumping lemma tells us that s can be written in the form ${\displaystyle s=uvwxy}$, where u, v, w, x, and y are substrings, such that ${\displaystyle |vx|\geq 1}$, ${\displaystyle |vwx|\leq p}$, and ${\displaystyle uv^{i}wx^{i}y\in L}$ for every integer ${\displaystyle i\geq 0}$. By the choice of s and the fact that ${\displaystyle |vwx|\leq p}$, it is easily seen that the substring vwx can contain no more than two distinct symbols. That is, we have one of five possibilities for vwx:

1. ${\displaystyle vwx=a^{j}}$ for some ${\displaystyle j\leq p}$.
2. ${\displaystyle vwx=a^{j}b^{k}}$ for some j and k with ${\displaystyle j+k\leq p}$
3. ${\displaystyle vwx=b^{j}}$ for some ${\displaystyle j\leq p}$.
4. ${\displaystyle vwx=b^{j}c^{k}}$ for some j and k with ${\displaystyle j+k\leq p}$.
5. ${\displaystyle vwx=c^{j}}$ for some ${\displaystyle j\leq p}$.

For each case, it is easily verified that ${\displaystyle uv^{i}wx^{i}y}$ does not contain equal numbers of each letter for any ${\displaystyle i\neq 1}$. Thus, ${\displaystyle uv^{2}wx^{2}y}$ does not have the form ${\displaystyle a^{i}b^{i}c^{i}}$. This contradicts the definition of L. Therefore, our initial assumption that L is context free must be false.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free.

On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example

${\displaystyle L=\{b^{j}c^{k}d^{l}|j,k,l\in \mathbb {N} \}\cup \{a^{i}b^{j}c^{j}d^{j}|i,j\in \mathbb {N} ,i\geq 1\}}$

for s=bjckdl with e.g. j≥1 choose vwx to consist only of b’s, for s=aibjcjdj choose vwx to consist only of a’s; in both cases all pumped strings are still in L.[2]

A precursor of the pumping lemma was used in 1960 by Scheinberg to prove that ${\displaystyle L=\{a^{n}b^{n}a^{n}|n>0\}}$ is not context-free.[3]

## References

1. ^ Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009). Combinatorics on words. Christoffel words and repetitions in words (PDF). CRM Monograph Series. 27. Providence, RI: American Mathematical Society. p. 90. ISBN 978-0-8218-4480-9. Zbl 1161.68043. (Also see [www-igm.univ-mlv.fr/~berstel/ Aaron Berstel's website)
2. ^ John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. Here: sect.6.1, p.129
3. ^ Stephen Scheinberg (1960). "Note on the Boolean Properties of Context Free Languages" (PDF). Information and Control. 3: 372–375. Here: Lemma 3, and its use on p.374-375.
• Bar-Hillel, Y.; Micha Perles; Eli Shamir (1961). "On formal properties of simple phrase-structure grammars". Zeitschrift für Phonetik, Sprachwissenschaft, und Kommunikationsforschung. 14 (2): 143–172. — Reprinted in: Y. Bar-Hillel (1964). Language and Information: Selected Essays on their Theory and Application. Addison-Wesley series in logic. Addison-Wesley. pp. 116–150. ISBN 0201003732. OCLC 783543642.
• Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN 0-534-94728-X. Section 1.4: Nonregular Languages, pp. 77–83. Section 2.3: Non-context-free Languages, pp. 115–119.