Ewens's sampling formula

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In population genetics, Ewens' sampling formula, describes the probabilities associated with counts of how many different alleles are observed a given number of times in the sample.

Definition[edit]

Ewens' sampling formula, introduced by Warren Ewens, states that under certain conditions (specified below), if a random sample of n gametes is taken from a population and classified according to the gene at a particular locus then the probability that there are a1 alleles represented once in the sample, and a2 alleles represented twice, and so on, is

\operatorname{Pr}(a_1,\dots,a_n; \theta)={n! \over \theta(\theta+1)\cdots(\theta+n-1)}\prod_{j=1}^n{\theta^{a_j} \over j^{a_j} a_j!},

for some positive number θ, whenever a1, ..., an is a sequence of nonnegative integers such that

a_1+2a_2+3a_3+\cdots+na_n=n.\,

The phrase "under certain conditions" used above is made precise by the following assumptions:

  • The sample size n is small by comparison to the size of the whole population; and
  • The population is in statistical equilibrium under mutation and genetic drift and the role of selection at the locus in question is negligible; and
  • Every mutant allele is novel. (See also idealised population.)

This is a probability distribution on the set of all partitions of the integer n. Among probabilists and statisticians it is often called the multivariate Ewens distribution.

Mathematical properties[edit]

When θ = 0, the probability is 1 that all n genes are the same. When θ = 1, then the distribution is precisely that of the integer partition induced by a uniformly distributed random permutation. As θ → ∞, the probability that no two of the n genes are the same approaches 1.

This family of probability distributions enjoys the property that if after the sample of n is taken, m of the n gametes are chosen without replacement, then the resulting probability distribution on the set of all partitions of the smaller integer m is just what the formula above would give if m were put in place of n.

The Ewens distribution arises naturally from the Chinese restaurant process.

See also[edit]

Notes[edit]

  • Warren Ewens, "The sampling theory of selectively neutral alleles", Theoretical Population Biology, volume 3, pages 87–112, 1972.
  • J.F.C. Kingman, "Random partitions in population genetics", Proceedings of the Royal Society of London, Series B, Mathematical and Physical Sciences, volume 361, number 1704, 1978.
  • S. Tavare and W. J. Ewens, "The Multivariate Ewens distribution". (1997, Chapter 41 from the reference below).
  • N.L. Johnson, S. Kotz, and N. Balakrishnan (1997) Discrete Multivariate Distributions, Wiley. ISBN 0-471-12844-9.