# Coupling from the past

Among Markov chain Monte Carlo (MCMC) algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.

## The basic idea

Consider a finite state irreducible aperiodic Markov chain ${\displaystyle M}$ with state space ${\displaystyle S}$ and (unique) stationary distribution ${\displaystyle \pi }$ (${\displaystyle \pi }$ is a probability vector). Suppose that we come up with a probability distribution ${\displaystyle \mu }$ on the set of maps ${\displaystyle f:S\to S}$ with the property that for every fixed ${\displaystyle s\in S}$, its image ${\displaystyle f(s)}$ is distributed according to the transition probability of ${\displaystyle M}$ from state ${\displaystyle s}$. An example of such a probability distribution is the one where ${\displaystyle f(s)}$ is independent from ${\displaystyle f(s')}$ whenever ${\displaystyle s\neq s'}$, but it is often worthwhile to consider other distributions. Now let ${\displaystyle f_{j}}$ for ${\displaystyle j\in \mathbb {Z} }$ be independent samples from ${\displaystyle \mu }$.

Suppose that ${\displaystyle x}$ is chosen randomly according to ${\displaystyle \pi }$ and is independent from the sequence ${\displaystyle f_{j}}$. (We do not worry for now where this ${\displaystyle x}$ is coming from.) Then ${\displaystyle f_{-1}(x)}$ is also distributed according to ${\displaystyle \pi }$, because ${\displaystyle \pi }$ is ${\displaystyle M}$-stationary and our assumption on the law of ${\displaystyle f}$. Define

${\displaystyle F_{j}:=f_{-1}\circ f_{-2}\circ \cdots \circ f_{-j}.}$

Then it follows by induction that ${\displaystyle F_{j}(x)}$ is also distributed according to ${\displaystyle \pi }$ for every ${\displaystyle j\in \mathbb {N} }$. Now here is the main point. It may happen that for some ${\displaystyle n\in \mathbb {N} }$ the image of the map ${\displaystyle F_{n}}$ is a single element of ${\displaystyle S}$. In other words, ${\displaystyle F_{n}(x)=F_{n}(y)}$ for each ${\displaystyle y\in S}$. Therefore, we do not need to have access to ${\displaystyle x}$ in order to compute ${\displaystyle F_{n}(x)}$. The algorithm then involves finding some ${\displaystyle n\in \mathbb {N} }$ such that ${\displaystyle F_{n}(S)}$ is a singleton, and outputing the element of that singleton. The design of a good distribution ${\displaystyle \mu }$ for which the task of finding such an ${\displaystyle n}$ and computing ${\displaystyle F_{n}}$ is not too costly is not always obvious, but has been accomplished successfully in several important instances[citation needed].

## The monotone case

There is a special class of Markov chains in which there are particularly good choices for ${\displaystyle \mu }$ and a tool for determining if ${\displaystyle |F_{n}(S)|=1}$. (Here ${\displaystyle |\cdot |}$ denotes cardinality.) Suppose that ${\displaystyle S}$ is a partially ordered set with order ${\displaystyle \leq }$, which has a unique minimal element ${\displaystyle s_{0}}$ and a unique maximal element ${\displaystyle s_{1}}$; that is, every ${\displaystyle s\in S}$ satisfies ${\displaystyle s_{0}\leq s\leq s_{1}}$. Also, suppose that ${\displaystyle \mu }$ may be chosen to be supported on the set of monotone maps ${\displaystyle f:S\to S}$. Then it is easy to see that ${\displaystyle |F_{n}(S)|=1}$ if and only if ${\displaystyle F_{n}(s_{0})=F_{n}(s_{1})}$, since ${\displaystyle F_{n}}$ is monotone. Thus, checking this becomes rather easy. The algorithm can proceed by choosing ${\displaystyle n:=n_{0}}$ for some constant ${\displaystyle n_{0}}$, sampling the maps ${\displaystyle f_{-1},\dots ,f_{-n}}$, and outputing ${\displaystyle F_{n}(s_{0})}$ if ${\displaystyle F_{n}(s_{0})=F_{n}(s_{1})}$. If ${\displaystyle F_{n}(s_{0})\neq F_{n}(s_{1})}$ the algorithm proceeds by doubling ${\displaystyle n}$ and repeating as necessary until an output is obtained. (But the algorithm does not resample the maps ${\displaystyle f_{-j}}$ which were already sampled; it uses the previously sampled maps when needed.)

## References

• Propp, James Gary; Wilson, David Bruce (1996), Proceedings of the Seventh International Conference on Random Structures and Algorithms (Atlanta, GA, 1995), pp. 223–252, MR 1611693
• Propp, James; Wilson, David (1998), "Coupling from the past: a user's guide" (PDF), Microsurveys in discrete probability (Princeton, NJ, 1997), DIMACS Ser. Discrete Math. Theoret. Comput. Sci., 41, Providence, R.I.: American Mathematical Society, pp. 181–192, MR 1630414