# Coupon collector's problem (generating function approach)

The coupon collector's problem can be solved in several different ways. The generating function approach is a combinatorial technique that allows one to obtain precise results.

This technique uses the probability generating function (PGF) $G(z)$ where $[z^q] G(z)$ is the probability that q steps are taken to collect the n coupons i.e. $T=q$

and the expectation is given by

$\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1}.$

$G(z)$ can be calculated explicitly as follows :

$G(z) = \frac{n}{n} z \frac{1}{1-\frac{1}{n} z} \frac{n-1}{n} z \frac{1}{1-\frac{2}{n} z} \frac{n-2}{n} z \frac{1}{1-\frac{3}{n} z} \frac{n-3}{n} z \cdots \frac{1}{1-\frac{n-1}{n} z} \frac{n-(n-1)}{n} z.$

In this equation, It can be found some features :

$\frac{1}{1 - p z} = 1 + p z + p^2 z^2 + p^3 z^3 + \cdots,$

This is the PGF of an event that has probability p occurring zero or more times, with the exponent of z counting the number of times. We split the sequence of coupons into segments. A new segment begins every time a new coupon is retrieved for the first time. The PGF is the product of the PGFs of the individual segments.

Applying this to $G(z)$, It represents the following sequence of events:

• retrieve the first coupon (no restrictions at this time)
• retrieve the first coupon some number of times
• retrieve the second coupon (probability $(n-1)/n$)
• retrieve a mix of the first and second coupons some number of times
• retrieve the third coupon (probability $(n-2)/n$)
• retrieve a mix of the first, second, and third coupons some number of times
• retrieve the fourth coupon (probability $(n-3)/n$)
• $\ldots$
• retrieve the last coupon (probability $(n-(n-1))/n$).

In the following, $H_n$ and $H_n^{(2)}$ denote harmonic numbers.

The function $G(z)$, first simplified before deriving the expectation, can be expressed as follows :

$G(z) = z^n \frac{n-1}{n-z} \frac{n-2}{n-2z} \frac{n-3}{n-3z} \cdots \frac{n-(n-1)}{n-(n-1)z}$.

In this function, the following formula can be used to obtain the derivative of $G(z)$ :

$\frac{\mathrm{d}}{\mathrm{d}z} \frac{n-k}{n-kz} = \frac{k(n-k)}{(n-kz)^2}$

so thus

$\frac{\mathrm{d}}{\mathrm{d}z} G(z) = G(z) \left( \frac{n}{z} + \frac{1}{n-z} + \frac{2}{n-2z} + \frac{3}{n-3z} \cdots + \frac{n-1}{n-(n-1)z} \right)$.

From this equation, it can be found $\operatorname{E}(T)$ as following :

$\operatorname{E}(T) = \left. \frac{\mathrm{d}}{\mathrm{d}z} G(z) \right|_{z=1} = G(1) \left( n + \frac{1}{n-1} + \frac{2}{n-2} + \frac{3}{n-3} \cdots + \frac{n-1}{n-(n-1)} \right)$

or

$\operatorname{E}(T) = n + \sum_{k=1}^{n-1} \frac{k}{n-k}.$

The next formula is useful for simplifying the $\operatorname{E}(T)$:

$\sum_{k=1}^{n-1} \frac{k}{n-k} = \sum_{k=1}^{n-1} \left( \frac{k}{n-k} - \frac{n}{n-k} \right) + n H_{n-1} = n H_{n-1} - (n-1)$

so that

$\operatorname{E}(T) = n + n H_{n-1} - (n-1) = n H_{n-1} + 1 = n H_n, \quad \mbox{QED.}$

The PGF $G(z)$ makes it possible to obtain an exact value for the variance.

$\operatorname{Var}(T) = \operatorname{E}(T(T-1)) + \operatorname{E}(T) - \operatorname{E}(T)^2,$

This formula consists entirely of factorial moments that can be calculated from the PGF. The value of $\operatorname{E}(T)$ is already found.

The remainder can be calculated as follows :

$\operatorname{E}(T(T-1)) = \left. \left( \frac{\mathrm{d}}{\mathrm{d}z} \right)^2 G(z) \right|_{z=1}.$

This derivative is

\begin{align} & G(z) \left( \frac{n}{z} + \frac{1}{n-z} + \frac{2}{n-2z} + \frac{3}{n-3z} \cdots + \frac{n-1}{n-(n-1)z} \right)^2 \\ + & \; G(z) \left( - \frac{n}{z^2} + \frac{1^2}{(n-z)^2} + \frac{2^2}{(n-2z)^2} + \frac{3^2}{(n-3z)^2} \cdots + \frac{(n-1)^2}{(n-(n-1)z)^2} \right) \end{align}

Evaluation at $z=1$ yields

\begin{align} & n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{k^2}{(n-k)^2} = n^2 H_n^2 - n + \sum_{k=1}^{n-1} \frac{(n-k)^2}{k^2} \\ = & \; n^2 H_n^2 - n + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + (n-1). \end{align}

As all value has been found, the conclusion can be reached at the following:

\begin{align} \operatorname{Var}(T) & = \; n^2 H_n^2 - 1 + n^2 H_{n-1}^{(2)} - 2 n H_{n-1} + n H_{n-1} + 1 - n^2 H_n^2 \\ & = \; n^2 H_{n-1}^{(2)} - n H_{n-1} < \frac{\pi^2}{6} n^2, \quad \text{QED.} \end{align}