Genetic correlation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Genetic correlation is the proportion of variance that two traits share due to genetic causes.[1] Outside the theoretical boundary case of traits with zero heritability (the proportion of observable differences in a trait between individuals within a population that is due to genetic differences), the genetic correlation of traits is independent of their heritability: i.e., two traits can have a very high genetic correlation even when the heritability of each is low and vice versa.

The genetic correlation, then, tells us how much of the genetic influence on two traits is common to both: if it is above zero, this suggests that the two traits are influenced by common genes. This can be an important constraint on conceptualizations of the two traits: traits which seem different phenotypically but which share a common genetic basis require an explanation for how these genes can influence both traits.

For example, consider two traits - dark skin and black hair. These two traits may individually have a very high heritability (most of the population-level variation in the trait due to genetic differences, or in simpler terms, genetics contributes significantly to these two traits), however, they may still have a very low genetic correlation if, for instance, these two traits were being controlled by different, non-overlapping, non-linked genetic loci.

Computing the genetic correlation[edit]

Estimates of a genetic correlation obviously require a genetically informative sample, such as a twin study.

Given a genetic covariance matrix, the genetic correlation is computed by standardizing this, i.e., by converting the covariance matrix to a correlation matrix. For example, if two traits, say height and weight have the following additive genetic variance-covariance matrix:

Height Weight
Height 36 36
Weight 36 117

Then the genetic correlation is .55, as seen is the standardized matrix below:

Height Weight
Height 1
Weight .55 1

In practice, structural equation modeling applications such as OpenMx are used to calculate both the genetic covariance matrix and its standardized form. In R, cov2cor() will standardize the matrix.

Typically, published reports will provide genetic variance components that have been standardized as a proportion of total variance (for instance in an ACE twin study model standardised as a proportion of V-total = A+C+E). In this case, the metric for computing the genetic covariance (the variance within the genetic covariance matrix) is lost (because of the standardizing process), so you cannot readily estimate the genetic correlation of two traits from such published models. Multivariate models (such as the Cholesky decomposition[better source needed]) will, however, allow the viewer to see shared genetic effects (as opposed to the genetic correlation) by following path rules. it is important therefore to provide the unstandardised path coefficients in publications.

Non-twin study methods of estimating genetic correlations include GCTA and LD regression;[2] an advantage of the latter is that it does not require subject-level data but can be computed from GWAS summary statistics, enabling collaborations such as LD Hub, which consolidates >844 datasets to allow genetic correlation inference across scores of phenotypes and is demonstrated by computing 49 traits' genetic correlations.[3]

See also[edit]


  1. ^ Neale, M. C., & Maes, H. H. (1996). Methodology for genetics studies of twins and families (6th ed.). Dordrecht, The Netherlands: Kluwer.
  2. ^ "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies", Bulik-Sullivan et al 2015
  3. ^ "LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis", Zheng et al 2016

External links[edit]