# Pumping lemma for context-free languages

In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel[clarification needed] lemma, is a lemma that gives a property shared by all context-free languages. It generalizes the pumping lemma for regular languages.

## Formal statement

Proof idea: If s is sufficiently long, its derivation tree w.r.t. a Chomsky normal form grammar must contain some nonterminal N twice on some tree path (upper picture). Repeating n times the derivation part N ⇒...⇒ vNx obtains a derivation for uvnwxny (lower left and right picture for n=0 and 2, respectively).

If a language L is context-free, then there exists some integer p ≥ 1 (called a "pumping length"[1]) such that every string s in L that is longer than or equal to p symbols (i.e. with |s| ≥ p) can be written as

s = uvwxy

with substrings u, v, w, x and y, such that

1. |vwx| ≤ p,
2. |vx| ≥ 1, and
3. uvnwxny is in L for all n ≥ 0.

## Informal statement and explanation

The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.

The property is a property of all strings in the language that are of length at least p, where p is a constant—called the pumping length—that varies between context-free languages.

Say s is a string of length at least p that is in the language.

The pumping lemma states that s can be split into five substrings, s = uvwxy, where vx is non-empty and the length of vwx is at most p, such that repeating v and x any (and the same) number of times in s produces a string that is still in the language (it is possible and often useful to repeat zero times, which removes v and x from the string). This process of "pumping up" additional copies of v and x is what gives the pumping lemma its name.

Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having p equal to the maximum string length in L plus one. As there are no strings of this length the pumping lemma is not violated.

## Usage of the lemma

The pumping lemma is often used to prove that a given language L is non-context-free, by showing that arbitrarily long strings s are in L that cannot be "pumped" without producing strings outside L.

For example, the language L = { anbncn | n > 0 } can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that L is context free. By the pumping lemma, there exists an integer p which is the pumping length of language L. Consider the string s = apbpcp in L. The pumping lemma tells us that s can be written in the form s = uvwxy, where u, v, w, x, and y are substrings, such that |vwx| ≤ p, |vx| ≥ 1, and uviwxiy is in L for every integer i ≥ 0. By the choice of s and the fact that |vwx| ≤ p, it is easily seen that the substring vwx can contain no more than two distinct symbols. That is, we have one of five possibilities for vwx:

1. vwx = aj for some jp.
2. vwx = ajbk for some j and k with j+kp.
3. vwx = bj for some jp.
4. vwx = bjck for some j and k with j+kp.
5. vwx = cj for some jp.

For each case, it is easily verified that uviwxiy does not contain equal numbers of each letter for any i ≠ 1. Thus, uv2wx2y does not have the form aibici. This contradicts the definition of L. Therefore, our initial assumption that L is context free must be false.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free.

On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example L = { bjckdl | j, k, l ∈ ℕ } ∪ { aibjcjdj | i, j ∈ ℕ, i≥1 }: for s=bjckdl with e.g. j≥1 choose vwx to consist only of b’s, for s=aibjcjdj choose vwx to consist only of a’s; in both cases all pumped strings are still in L.[2] There are more powerful proof techniques available, such as Ogden's lemma, but also these techniques do not give a complete characterization of the context-free languages.

## References

1. ^ Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009). Combinatorics on words. Christoffel words and repetitions in words. CRM Monograph Series 27. Providence, RI: American Mathematical Society. p. 90. ISBN 978-0-8218-4480-9. Zbl 1161.68043.
2. ^ John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. Here: sect.6.1, p.129
• Bar-Hillel, Y.; Micha Perles, M. and Eli Shamir (1961). "On formal properties of simple phrase-structure grammars". Zeitschrift fur Phonetik, Sprachwissenschaft, und Kommunikationsforschung 14 (2): 143–172. — Reprinted in: Y. Bar-Hillel (1964). Language and Information: Selected Essays on their Theory and Application. Addison-Wesley series in logic. Addison-Wesley. pp. 116–150. ISBN 0201003732. OCLC 783543642.
• Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN 0-534-94728-X. Section 1.4: Nonregular Languages, pp. 77–83. Section 2.3: Non-context-free Languages, pp. 115–119.