# Pumping lemma for context-free languages

The pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma that gives a property shared by all context-free languages.

## Formal statement

If a language L is context-free, then there exists some integer p ≥ 1 such that every string s in L with |s| ≥ p (where p is a "pumping length"[1]) can be written as

s = uvxyz

with substrings u, v, x, y and z, such that

1. |vxy| ≤ p,
2. |vy| ≥ 1, and
3. uv nxy nz is in L for all n ≥ 0.

## Informal statement and explanation

The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.

The property is a property of all strings in the language that are of length at least p, where p is a constant—called the pumping length—that varies between context-free languages.

Say s is a string of length at least p that is in the language.

The pumping lemma states that s can be split into five substrings, $s = uvxyz$, where vy is non-empty and the length of vxy is at most p, such that repeating v and y any (and the same) number of times in s produces a string that is still in the language (it's possible and often useful to repeat zero times, which removes v and y from the string). This process of "pumping up" additional copies of v and y is what gives the pumping lemma its name.

Note that finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having p equal to the maximum string length in L plus one. As there are no strings of this length the pumping lemma is not violated.

The pumping lemma is often used to prove that a given language is non-context-free by showing that for each p, we can find some string s of length at least p in the language that does not have the properties outlined above, i.e. that it cannot be "pumped" without producing some strings that are not in the language.

## Usage of the lemma

The pumping lemma for context-free languages can be used to show that certain languages are not context-free.

For example, we can show that language $L = \lbrace a^ib^ic^i|i>0\rbrace$ is not context-free by using the pumping lemma in a proof by contradiction. First, assume that $L$ is context free. By the pumping lemma, there exists an integer $p$ which is the pumping length of language $L$. Consider the string $s = a^pb^pc^p$ in $L$. The pumping lemma tells us that $s$ can be written in the form $s=uvxyz$, where $u, v, x, y$, and $z$ are substrings, such that $|vxy| \le p$, $|vy|\ge 1$, and $uv^ixy^iz$ is in $L$ for every integer $i\ge 0$. By our choice of $s$ and the fact that $|vxy| \le p$, it is easily seen that the substring $vxy$ can contain no more than two distinct letters. That is, we have one of five possibilities for $vxy$:

1. $vxy = a^j$ for some $j \le p$.
2. $vxy = a^jb^k$ for some $j$ and $k$ with $j+k \le p$.
3. $vxy = b^j$ for some $j \le p$.
4. $vxy = b^jc^k$ for some $j$ and $k$ with $j+k \le p$.
5. $vxy = c^j$ for some $j \le p$.

For each case, it is easily verified that $uv^ixy^iz$ does not contain equal numbers of each letter for any $i\ne 1$. Thus, $uv^2xy^2z$ does not have the form $a^ib^ic^i$. This contradicts the definition of $L$. Therefore, our initial assumption that $L$ is context free must be false.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free. On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma. There are more powerful proof techniques available, such as Ogden's lemma, but also these techniques do not give a complete characterization of the context-free languages.