Linkage disequilibrium

Linkage disequilibrium is a term used in the study of population genetics for the non-random association of alleles at two or more loci, not necessarily on the same chromosome. It is not the same as linkage, which describes the association of two or more loci on a chromosome with limited recombination between them. Linkage disequilibrium describes a situation in which some combinations of alleles or genetic markers occur more or less frequently in a population than would be expected from a random formation of haplotypes from alleles based on their frequencies. Non-random associations between polymorphisms at different loci are measured by the degree of linkage disequilibrium (LD). A comparison of different measures is provided by Devlin & Risch ^[1]

Linkage disequilibrium is generally caused by genetic linkage and the rate of recombination; mutation rate; random drift or non-random mating; and population structure. For example, some organisms may show linkage disequilibrium (such as bacteria) because they reproduce asexually and there is no recombination (r=0) to break down the linkage disequilibrium: D'=(1-r)D.

It may be instructive to study genetic equilibrium, and its application in the Hardy-Weinberg principle.

The International HapMap Project enables the study of LD in human populations online. The Ensembl project integrates HapMap data and such from dbSNP in general with other genetic information.

Linkage disequilibrium measure, $\delta$

Formally, if we define pairwise LD, we consider indicator variables on alleles at two loci, say $I_{1},I_{2}$ . We define the LD parameter $\delta$ (delta) as:

\delta :=\operatorname {cov} (I_{1},I_{2})=p_{1}p_{2}-h_{12}=h_{11}h_{22}-h_{12}h_{21}

Here $p_{1},p_{2}$ denote the marginal allele frequencies at the two loci and $h_{12}$ denotes the haplotype frequency in the joint distribution of both alleles. Various derivatives of this parameter have been developed. In the genetic literature the wording "two alleles are in LD" usually means to imply $\delta \neq 0$ . Contrariwise, linkage equilibrium, denotes the case $\delta =0$ .

Linkage disequilibrium measure, D

If inspecting the two loci A and B with two alleles each—a two-locus, two-allele model—the following table denotes the frequencies of each combination:

Haplotype	Frequency
$A_{1}B_{1}$	$x_{11}$
$A_{1}B_{2}$	$x_{12}$
$A_{2}B_{1}$	$x_{21}$
$A_{2}B_{2}$	$x_{22}$

From there one can determine the frequency of each of the alleles:

Allele	Frequency
$A_{1}$	$p_{1}=x_{11}+x_{12}$
$A_{2}$	$p_{2}=x_{21}+x_{22}$
$B_{1}$	$q_{1}=x_{11}+x_{21}$
$B_{2}$	$q_{2}=x_{12}+x_{22}$

if the two loci and the alleles are independent from each other, then one can express the observation A1B1 as "A1 must be found and B1 must be found". The table above lists the frequencies for $A_{1},p_{1}$ , and $B_{1},q_{1}$ , hence the frequency of $A_{1}B_{1}$ , $x_{11}$ , equals according to the rules of elementary statistics $x_{11}=p_{1}*q_{1}$ .

A deviation of the observed frequencies from the expected is referred to as the linkage disequilibrium parameter, introduced by Robbins (1918)^[2] and named by Lewontin and Kojima (1960)^[3] and commonly denoted by a capital D as defined by $D=x_{11}-p_{1}q_{1}$ . It is vividly presented in the following table.

	$A_{1}$	$A_{2}$	Total
$B_{1}$	$x_{11}=p_{1}q_{1}+D$	$x_{21}=p_{2}q_{1}-D$	$q_{1}$
$B_{2}$	$x_{12}=p_{1}q_{2}-D$	$x_{22}=p_{2}q_{2}+D$	$q_{2}$
Total	$p_{1}$	$p_{2}$	$1$

When extending these formula for diploid cells rather than investigating the gametes/haplotypes directly, the laid out principle prevails, the recombination rate between the two loci $A$ and $B$ must be taken into account, though, which is commonly denoted by the letter $c$ .

$D$ is nice to calculate with but has the disadvantage of depending on the frequency of the alleles inspected. This is evident since frequencies are between 0 and 1. There can be no $D$ observed if any locus has an allele frequency 0 or 1 and is maximal when frequencies are at 0.5. Lewontin (1964) suggested normalising D by dividing it with the theoretical maximum for the observed allele frequencies. Thus $D'={\frac {D}{D_{\max }}}$ when $D>=0$ When $D<0$ , $D'={\frac {D}{D_{\min }}}$ .

$D_{\max }$ is given by the smaller of $p_{1}q_{2}$ and $p_{2}q_{1}$ . $D_{\min }$ is given by the larger of $-p_{1}q_{1}$ and $-p_{2}q_{2}$

Another value is the correlation coefficient as also laid out in the initial paragraphs of this page, denoted as $r^{2}={\frac {D^{2}}{p_{1}p_{2}q_{1}q_{2}}}$ . This however is not adjusted to the loci having different allele frequencies. If it was, $r$ , the square root of $r^{2}$ if given the sign of $D$ would be equivalent to $D'$ ^[4]

Another statistic used in a selective neutrality test is Tajima's D, to decide whether the mean number of differences between pairs of DNA sequences is compatible with the observed number of segregating sites in a sample.

These are summary statistics (i.e. descriptive statistics summarizing the pattern of genetic diversity) that are computed from diploid samples of DNA sequences and which assume that the gametic phase is known.

Analysis Software

Haploview
LdCompare^[5] — open-source software for calculating LD.
PyPop
HelixTree - commercial software with interactive LD plot.

References

^ Devlin B., Risch N. (1995). "A Comparison of Linkage Disequilibrium Measures for Fine-Scale Mapping" (PDF). Genomics. 29: 311–322.
^ Robbins, R.B. (1918). "Some applications of mathematics to breeding problems III". Genetics. 3: 375–389.
^ R.C. Lewontin and K. Kojima (1960). "The evolutionary dynamics of complex polymorphisms". Evolution. 14: 458–472.
^ P.W. Hedrick and S. Kumar (2001). "Mutation and linkage disequilibrium in human mtDNA". Eur. J. Hum. Genet. 9: 969–972.
^ Hao K., Di X., Cawley S. (2007). "LdCompare: rapid computation of single- and multiple-marker r2 and genetic coverage". Bioinformatics. 23: 252–254.{{cite journal}}: CS1 maint: multiple names: authors list (link)

v t e Population genetics
Key concepts	Hardy–Weinberg principle Genetic linkage Identity by descent Linkage disequilibrium Fisher's fundamental theorem Neutral theory Shifting balance theory Price equation Coefficient of inbreeding Coefficient of relationship Selection coefficient Fitness Heritability Population structure Constructive neutral evolution
Selection	Natural Artificial Sexual Ecological
Effects of selection on genomic variation	Genetic hitchhiking Background selection
Genetic drift	Small population size Population bottleneck Founder effect Coalescence Balding–Nichols model
Founders	R. A. Fisher J. B. S. Haldane Sewall Wright
Related topics	Biogeography Evolution Evolutionary game theory Fitness landscape Genetic genealogy Landscape genetics and genomics Microevolution Population genomics Phylogeography Quantitative genetics
Index of evolutionary biology articles

Linkage disequilibrium

Linkage disequilibrium measure, $\delta$

Linkage disequilibrium measure, D

Analysis Software

References

See also

Further reading

Linkage disequilibrium measure, δ {\displaystyle \delta }

Linkage disequilibrium measure, D

Analysis Software

References

See also

Further reading

Linkage disequilibrium measure, $\delta$