= Watterson estimator =

In population genetics, the Watterson estimator is a method for describing the genetic diversity in a population. It was developed by Margaret Wu and G. A. Watterson in the 1970s. It is estimated by counting the number of polymorphic sites. It is a measure of the "population mutation rate" (the product of the effective population size and the neutral mutation rate) from the observed nucleotide diversity of a population. $\theta = 4N_e\mu$, where $N_e$ is the effective population size and $\mu$ is the per-generation mutation rate of the population of interest ( ). The assumptions made are that there is a sample of $n$ haploid individuals from the population of interest with effective size $N_e$, that $n \ll N_e$, and that there are infinitely many sites capable of varying (so that mutations never overlay or reverse one another).
Because the number of segregating sites counted will increase with the number of sequences looked at, the correction factor $a_n$ is used.

The estimate of $\theta$, often denoted as $\widehat {\theta\,}_w$, is

 $\widehat {\theta\,}_w = { K \over a_n },$

where $K$ is the number of segregating sites (an example of a segregating site would be a single-nucleotide polymorphism) in the sample and

 $a_n = \sum^{n-1}_{i=1} {1 \over i}$

is the $(n-1)$th harmonic number.

This estimate is based on coalescent theory. The Watterson estimator is commonly used for its simplicity. When its assumptions are met, the estimator is unbiased and the variance of the estimator decreases with increasing sample size or recombination rate. However, the estimator can be biased by population structure. For example, $\widehat{\theta\,}_w$ is downwardly biased in an exponentially growing population. It can also be biased by violation of the infinite-sites mutational model; multiple point mutations at a single site will downwardly bias the estimate.

Comparing the value of the Watterson's estimator to nucleotide diversity ($\pi$) is the basis of Tajima's D, which is used to determine whether a DNA sequence is evolving neutrally or under a non-random process (e.g., selection).

== See also ==

- Coupon collector's problem
- Ewens sampling formula
