# The Pumping Lemma for Regular Languages

There are two versions of the Pumping Lemma. One is for context free grammars and one is for regular languages. This post is about the latter. The Pumping Lemma describes a property that all natural languages share. While it cannot be used by itself to prove that any given language is regular, it can be used to prove, often using proof by contradiction, that a language is not regular.  In this sense the Pumping Lemma provides a necessary condition for a language to be regular but not a sufficient one.

## The Pumping Lemma for Regular Languages

Let L be a regular language. There exists an integer p ≥ 1, that depends only on L, such that for every string w in L of length greater than or equal to p, w =xyz (w is the concatenation of strings x, y, and z). For all such w the following conditions are satisfied:

1. |y| ≥ 1,
2. |xy| ≤ p, and
3. xynz ∈ L for all n ≥ 0

In a nutshell, in all regular languages any string w that is longer than p, called the pumping length, can be viewed as the concatenation of three strings: x, y, and z, w=xyz. Rule 1 tells us that y must exist. Rule 2 gives us a constraint on how long xy can be, and rule 3 states that in addition to xyz being in L, so is xynz for all n ≥ 0, ie., xz, xyz, xyyz, xyyyx, etc.  The process of repeating the y portion of the string is known as pumping.

### Proof

By Kleene’s Theorem if a language L is regular then there exists a deterministic finite state automaton (DFA) M such that L = L(M). Let the pumping length p be equal to the number of states in M. For all strings of length p or greater, let q0 be the start state and q1, q2, …, qp be the next p states visited as the string is consumed. This sequence is p + 1 states long, but since M only has p states, at least one state must have been repeated.  We will call the state that was repeated qs.

Note that in the above diagram we have illustrated the loop with two states qs and qt, but if y is only one symbol in length then qs ≡ qt.

By assumption the string xyz ∈ L, so we have now shown that:

1. xynz ∈ L for all n ≥ 0 (Because we can repeat the loop ad infinitum before reaching the final state.)
2. |y| ≥ 1 (Because there must be at least one state in the path that is repeated.)
3. |xy| ≤ p (Because there only p states xy can’t traverse more than p states without hitting a loop.)

### Example

Given L = {anbn | n ≥ 0} show that L is not a regular language.

1. Assume that L is a regular language.
2. By the Pumping Lemma there exists a pumping length p.
3. Consider the string s = apbp.
1. s satisfies |s| ≥ p since |s| = 2p.
4. For any x, y, z such that s = xyz and |y| ≥ 1 and |xy| ≤ p, y will consist solely of a’s.
5. Suppose i = 2. xyiz = xyyz = ap+|y|bp . Since |y| ≥ 1, the number of a’s in the string is greater the number of b’s. xyiz is not in L. this contradicts the Pumping Lemma so L is not a regular language.

### Converse of the Pumping Lemma

The conversion of the Pumping Lemma is to say that any language that satisfies the conditions:

1. xynz ∈ L for all n ≥ 0
2. |y| ≥ 1
3. |xy| ≤ p

is a regular language. Unfortunately, this is not true. It can be shown that there are languages that satisfy the pumping lemma but are not regular. This means that satisfying the Pumping Lemma is not sufficient for a language to be regular.

### Summary

If we let p be the proposition that language L satisfies the Pumping Lemma and r is the proposition that language L is regular:

Being a regular language implies satisfying the Pumping Lemma, ie., satisfying the Pumping Lemma is necessary to be a regular language.

Satisfying the Pumping Lemma does not imply being a regular language, ie., satisfying the Pumping Lemma is not sufficient for being a regular language.

If you want a necessary and sufficient condition for a regular language, then you need the Myhill-Nerode Theorem, which, coincidentally enough, is what my next post will be about.

This site uses Akismet to reduce spam. Learn how your comment data is processed.