Coupon collector's problem (generating function approach)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The coupon collector's problem can be solved in several different ways. The generating function approach is a combinatorial technique that allows one to obtain precise results.

This technique uses the probability generating function (PGF) G(z) where [z^q] G(z) is the probability that q steps are taken to collect the n coupons i.e. T=q

and the expectation is given by

\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1}.

G(z) can be calculated explicitly as follows :

 G(z) =
\frac{n}{n} z
\frac{1}{1-\frac{1}{n} z}
\frac{n-1}{n} z
\frac{1}{1-\frac{2}{n} z}
\frac{n-2}{n} z
\frac{1}{1-\frac{3}{n} z}
\frac{n-3}{n} z
\frac{1}{1-\frac{n-1}{n} z}
\frac{n-(n-1)}{n} z.

In this equation, It can be found some features :

 \frac{1}{1 - p z} = 1 + p z + p^2 z^2 + p^3 z^3 + \cdots,

This is the PGF of an event that has probability p occurring zero or more times, with the exponent of z counting the number of times. We split the sequence of coupons into segments. A new segment begins every time a new coupon is retrieved for the first time. The PGF is the product of the PGFs of the individual segments.

Applying this to G(z), It represents the following sequence of events:

  • retrieve the first coupon (no restrictions at this time)
  • retrieve the first coupon some number of times
  • retrieve the second coupon (probability (n-1)/n)
  • retrieve a mix of the first and second coupons some number of times
  • retrieve the third coupon (probability (n-2)/n)
  • retrieve a mix of the first, second, and third coupons some number of times
  • retrieve the fourth coupon (probability (n-3)/n)
  • \ldots
  • retrieve the last coupon (probability (n-(n-1))/n).

In the following, H_n and H_n^{(2)} denote harmonic numbers.

The function G(z), first simplified before deriving the expectation, can be expressed as follows :

 G(z) = z^n

In this function, the following formula can be used to obtain the derivative of G(z) :

 \frac{\mathrm{d}}{\mathrm{d}z} \frac{n-k}{n-kz} = \frac{k(n-k)}{(n-kz)^2}

so thus

\frac{\mathrm{d}}{\mathrm{d}z} G(z) =
+ \frac{1}{n-z}
+ \frac{2}{n-2z}
+ \frac{3}{n-3z}
+ \frac{n-1}{n-(n-1)z}

From this equation, it can be found \operatorname{E}(T) as following :

\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1}
= G(1)
+ \frac{1}{n-1}
+ \frac{2}{n-2}
+ \frac{3}{n-3}
+ \frac{n-1}{n-(n-1)}


\operatorname{E}(T) = n + \sum_{k=1}^{n-1} \frac{k}{n-k}.

The next formula is useful for simplifying the \operatorname{E}(T):

 \sum_{k=1}^{n-1} \frac{k}{n-k} =
\sum_{k=1}^{n-1} \left( \frac{k}{n-k} - \frac{n}{n-k} \right) + n H_{n-1} =
n H_{n-1} - (n-1)

so that

 \operatorname{E}(T) = n + n H_{n-1} - (n-1) = n H_{n-1} + 1 = n H_n,
\quad \mbox{QED.}

The PGF G(z) makes it possible to obtain an exact value for the variance.

\operatorname{Var}(T) =
\operatorname{E}(T(T-1)) + \operatorname{E}(T) - \operatorname{E}(T)^2,

This formula consists entirely of factorial moments that can be calculated from the PGF. The value of \operatorname{E}(T) is already found.

The remainder can be calculated as follows :

\operatorname{E}(T(T-1)) =
\left. \left( \frac{\mathrm{d}}{\mathrm{d}z} \right)^2 G(z) \right|_{z=1}.

This derivative is

& G(z)
+ \frac{1}{n-z}
+ \frac{2}{n-2z}
+ \frac{3}{n-3z}
+ \frac{n-1}{n-(n-1)z}
\right)^2 \\
+ & \;
- \frac{n}{z^2}
+ \frac{1^2}{(n-z)^2}
+ \frac{2^2}{(n-2z)^2}
+ \frac{3^2}{(n-3z)^2}
+ \frac{(n-1)^2}{(n-(n-1)z)^2}

Evaluation at z=1 yields

\begin{align} &
n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{k^2}{(n-k)^2} =
n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{(n-k)^2}{k^2} \\ = & \;
n^2 H_n^2 - n + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + (n-1).

As all value has been found, the conclusion can be reached at the following:

\operatorname{Var}(T) & = \;
n^2 H_n^2 - 1 + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + n H_{n-1} + 1 - n^2 H_n^2 \\ & = \;
n^2 H_{n-1}^{(2)} - n H_{n-1} < \frac{\pi^2}{6} n^2, \quad \text{QED.}