# Genetic correlation

(Redirected from Genetic correlations)

In multivariate quantitative genetics, a genetic correlation (denoted ${\displaystyle r_{g}}$ or ${\displaystyle r_{a}}$) is the proportion of variance that two traits share due to genetic causes,[1][2][3] the correlation between the genetic influences on a trait and the genetic influences on a different trait[4][5][6][7][8][9][10] estimating the degree of pleiotropy or causal overlap. A genetic correlation of 0 implies that the genetic effects on one trait are independent of the other, while a correlation of 1 implies that all of the genetic influences on the two traits are identical. The bivariate genetic correlation can be generalized to inferring genetic latent variable factors across > 2 traits using factor analysis. Genetic correlation models were introduced into behavioral genetics in the 1970s–1980s.

Genetic correlations have applications in validation of genome-wide association study (GWAS) results, breeding, prediction of traits, and discovering the etiology of traits & diseases.

They can be estimated using twin studies and molecular genetics. Genetic correlations have been found to be common in non-human genetics[11] and to be broadly similar to their respective phenotypic correlations,[12] and also found extensively in human traits, dubbed the 'phenome'.[13][14][15][16][17][18][19][20][21][22][23]

This finding of widespread pleiotropy has implications for artificial selection in agriculture, interpretation of phenotypic correlations, social inequality,[24] attempts to use Mendelian randomization in causal inference,[25][26][27][28][29][30] the understanding of the biological origins of complex traits, and the design of GWASes.

A genetic correlation is to be contrasted with environmental correlation between the environments affecting two traits (e.g. if poor nutrition in a household caused both lower IQ and height); a genetic correlation between two traits can contribute to the observed (phenotypic) correlation between two traits, but genetic correlations can also be opposite observed phenotypic correlations if the environment correlation is sufficiently strong in the other direction, perhaps due to tradeoffs or specialization.[31][32] The observation that genetic correlations usually mirror phenotypic correlations is known as "Cheverud's Conjecture"[33] and has been confirmed in animals[34][35] and humans, and showed they are of similar sizes;[36] for example, in the UK Biobank, of 118 continuous human traits, only 29% of their intercorrelations have opposite signs,[22] and a later analysis of 17 high-quality UKBB traits reported correlation near-unity.[37]

## Interpretation

Genetic correlations are not the same as heritability, as it is about the overlap between the two sets of influences and not their absolute magnitude; two traits could be both highly heritable but not be genetically correlated or have small heritabilities and be completely correlated (as long as the heritabilities are non-zero).

For example, consider two traits – dark skin and black hair. These two traits may individually have a very high heritability (most of the population-level variation in the trait due to genetic differences, or in simpler terms, genetics contributes significantly to these two traits), however, they may still have a very low genetic correlation if, for instance, these two traits were being controlled by different, non-overlapping, non-linked genetic loci.

A genetic correlation between two traits will tend to produce phenotypic correlations – e.g. the genetic correlation between intelligence and SES[15] or education and family SES[38] implies that intelligence/SES will also correlate phenotypically. The phenotypic correlation will be limited by the degree of genetic correlation and also by the heritability of each trait. The expected phenotypic correlation is the bivariate heritability' and can be calculated as the square roots of the heritabilities multiplied by the genetic correlation. (Using a Plomin example,[39] for two traits with heritabilities of 0.60 & 0.23, ${\displaystyle r_{g}=0.75}$, and phenotypic correlation of r=0.45 the bivariate heritability would be ${\displaystyle {\sqrt {0.60}}\cdot 0.75\cdot {\sqrt {0.23}}=0.28}$, so of the observed phenotypic correlation, 0.28/0.45 = 62% of it is due to genetics.)

## Cause

Genetic correlations can arise due to:[18]

1. linkage disequilibrium (two neighboring genes tend to be inherited together, each affecting a different trait)
2. biological pleiotropy (a single gene having multiple otherwise unrelated biological effects, or shared regulation of multiple genes[40])
3. mediated pleiotropy (a gene causes trait X and trait X causes trait Y).
4. biases: population stratification such as ancestry or assortative mating (sometimes called "gametic phase disequilibrium"), spurious stratification such as ascertainment bias/self-selection[41] or Berkson's paradox, or misclassification of diagnoses

## Uses

### Causes of changes in traits

Genetic correlations are scientifically useful because genetic correlations can be analyzed over time within an individual longitudinally[42] (e.g. intelligence is stable over a lifetime, due to the same genetic influences – childhood genetically correlates ${\displaystyle r_{g}=0.62}$ with old age[43]), or across studies or populations or ethnic groups/races, or across diagnoses, allowing discovery of whether different genes influence a trait over a lifetime (typically, they do not[4]), whether different genes influence a trait in different populations due to differing local environments, whether there is disease heterogeneity across times or places or sex (particularly in psychiatric diagnoses there is uncertainty whether 1 country's 'autism' or 'schizophrenia' is the same as another's or whether diagnostic categories have shifted over time/place leading to different levels of ascertainment bias), and to what degree traits like autoimmune or psychiatric disorders or cognitive functioning meaningfully cluster due sharing a biological basis and genetic architecture (for example, reading & mathematics disability genetically correlate, consistent with the Generalist Genes Hypothesis, and these genetic correlations explain the observed phenotypic correlations or 'co-morbidity';[44] IQ and specific measures of cognitive performance such as verbal, spatial, and memory tasks, reaction time, long-term memory, executive function etc. all show high genetic correlations as do neuroanatomical measurements, and the correlations may increase with age, with implications for the etiology & nature of intelligence). This can be an important constraint on conceptualizations of the two traits: traits which seem different phenotypically but which share a common genetic basis require an explanation for how these genes can influence both traits.

### Boosting GWASes

Genetic correlations can be used in GWASes by using polygenic scores or genome-wide hits for one (often more easily measured) trait to increase the prior probability of variants for a second trait; for example, since intelligence and years of education are highly genetically correlated, a GWAS for education will inherently also be a GWAS for intelligence and be able to predict variance in intelligence as well[45] and the strongest SNP candidates can be used to increase the statistical power of a smaller GWAS,[46] a combined analysis on the latent trait done where each measured genetically-correlated trait helps reduce measurement error and boosts the GWAS's power considerably (e.g. Krapohl et al. 2017, using elastic net and multiple polygenic scores, improving intelligence prediction from 3.6% of variance to 4.8%;[47] Hill et al. 2017b[48] uses MTAG[49] to combine 3 g-loaded traits of education, household income, and a cognitive test score to find 107 hits & doubles predictive power of intelligence) or one could do a GWAS for multiple traits jointly.[50][51]

Genetic correlations can also quantify the contribution of correlations <1 across datasets which might create a false "missing heritability", by estimating the extent to which differing measurement methods, racial influences, or environments create only partially overlapping sets of relevant genetic variants.[52]

### Breeding

Hairless dogs have imperfect teeth; long-haired and coarse-haired animals are apt to have, as is asserted, long or many horns; pigeons with feathered feet have skin between their outer toes; pigeons with short beaks have small feet, and those with long beaks large feet. Hence if man goes on selecting, and thus augmenting any peculiarity, he will almost certainly modify unintentionally other parts of the structure, owing to the mysterious laws of correlation.

Genetic correlations are also useful in applied contexts such as plant/animal breeding by allowing substitution of more easily measured but highly genetically correlated characteristics (particularly in the case of sex-linked or binary traits under the liability-threshold model, where differences in the phenotype can rarely be observed but another highly correlated measure, perhaps an endophenotype, is available in all individuals), compensating for different environments than the breeding was carried out in, making more accurate predictions of breeding value using the multivariate breeder's equation as compared to predictions based on the univariate breeder's equation using only per-trait heritability & assuming independence of traits, and avoiding unexpected consequences by taking into consideration that artificial selection for/against trait X will also increase/decrease all traits which positively/negatively correlate with X.[53][54][55][56][57] The limits to selection set by the inter-correlation of traits, and the possibility for genetic correlations to change over long-term breeding programs, lead to Haldane's dilemma limiting the intensity of selection and thus progress.

Breeding experiments on genetically correlated traits can measure the extent to which correlated traits are inherently developmentally linked & response is constrained, and which can be dissociated.[58] Some traits, such as the size of eyespots on the butterfly Bicyclus anynana can be dissociated in breeding,[59] but other pairs, such as eyespot colors, have resisted efforts.[60]

## Mathematical definition

Given a genetic covariance matrix, the genetic correlation is computed by standardizing this, i.e., by converting the covariance matrix to a correlation matrix. Generally, if ${\displaystyle \Sigma }$ is a genetic covariance matrix and ${\displaystyle D={\sqrt {{\text{diag}}(\Sigma )}}}$, then the correlation matrix is ${\displaystyle D^{-1}\Sigma D^{-1}}$. For a given genetic covariance ${\displaystyle cov_{g}}$ between two traits, one with genetic variance ${\displaystyle V_{g1}}$ and the other with genetic variance ${\displaystyle V_{g2}}$, the genetic correlation is computed in the same way as the correlation coefficient ${\displaystyle r_{g}={\frac {cov_{g}}{\sqrt {V_{g1}V_{g2}}}}}$.

## Computing the genetic correlation

Genetic correlations require a genetically informative sample. They can be estimated in breeding experiments on two traits of known heritability and selecting on one trait to measure the change in the other trait (allowing inferring the genetic correlation), family/adoption/twin studies (analyzed using SEMs or DeFries–Fulker extremes analysis), molecular estimation of relatedness such as GCTA,[61] methods employing polygenic scores like LD score regression,[16][62] BOLT-REML,[63] CPBayes,[64] or HESS,[65] comparison of genome-wide SNP hits in GWASes (as a loose lower bound), and phenotypic correlations of populations with at least some related individuals.[66]

As with estimating SNP heritability, the better computational scaling & the ability to estimate only using public polygenic scores is a particular advantage for LD score regression over competing methods, and combined with the increasing availability of polygenic scores from datasets like the UK Biobank has led to an explosion of genetic correlation research in the 2010s.[citation needed]

The methods are related to Haseman-Elston regression & PCGC regression.[67] Such methods are typically genome-wide, but it is also possible to estimate genetic correlations for specific variants or genome regions.[68]

One way to consider it is using trait X in twin 1 to predict trait Y in twin 2 for monozygotic and dizygotic twins (i.e. using twin 1's IQ to predict twin 2's brain volume); if this cross-correlation is larger for the more genetically-similar monozygotic twins than for the dizygotic twins, the similarity indicates that the traits are not genetically independent and there is some common genetics influencing both IQ and brain volume. (Statistical power can be boosted by using siblings as well.[69])

Genetic correlations are affected by methodological concerns; underestimation of heritability, such as due to assortative mating, will lead to overestimates of longitudinal genetic correlation,[70] and moderate levels of misdiagnoses can create pseudo correlations.[71]

As they are affected by heritabilities of both traits, genetic correlations have low statistical power, especially in the presence of measurement errors biasing heritability downwards, because "estimates of genetic correlations are usually subject to rather large sampling errors and therefore seldom very precise": the standard error of an estimate ${\displaystyle r_{g}}$ is ${\displaystyle \sigma (r_{g})={\frac {1-r_{g}^{2}}{\sqrt {2}}}\cdot {\sqrt {\frac {\sigma (h_{x}^{2})\cdot \sigma (h_{y}^{2})}{h_{x}^{2}\cdot h_{y}^{2}}}}}$.[72] (Larger genetic correlations & heritabilities will be estimated more precisely.[73]) However, inclusion of genetic correlations in an analysis of a pleiotropic trait can boost power for the same reason that multivariate regressions are more powerful than separate univariate regressions.[74]

Twin methods have the advantage of being usable without detailed biological data, with human genetic correlations calculated as far back as the 1970s and animal/plant genetic correlations calculated in the 1930s, and require sample sizes in the hundreds for being well-powered, but they have the disadvantage of making assumptions which have been criticized, and in the case of rare traits like anorexia nervosa it may be difficult to find enough twins with a diagnosis to make meaningful cross-twin comparisons, and can only be estimated with access to the twin data; molecular genetic methods like GCTA or LD score regression have the advantage of not requiring specific degrees of relatedness and so can easily study rare traits using case-control designs, which also reduces the number of assumptions they rely on, but those methods could not be run until recently, require large sample sizes in the thousands or hundreds of thousands (to obtain precise SNP heritability estimates, see the standard error formula), may require individual-level genetic data (in the case of GCTA but not LD score regression).

More concretely, if two traits, say height and weight have the following additive genetic variance-covariance matrix:

 Height Weight Height 36 36 Weight 36 117

Then the genetic correlation is .55, as seen is the standardized matrix below:

 Height Weight Height 1 Weight .55 1

In practice, structural equation modeling applications such as Mx or OpenMx (and before that, historically, LISREL[75]) are used to calculate both the genetic covariance matrix and its standardized form. In R, cov2cor() will standardize the matrix.

Typically, published reports will provide genetic variance components that have been standardized as a proportion of total variance (for instance in an ACE twin study model standardised as a proportion of V-total = A+C+E). In this case, the metric for computing the genetic covariance (the variance within the genetic covariance matrix) is lost (because of the standardizing process), so you cannot readily estimate the genetic correlation of two traits from such published models. Multivariate models (such as the Cholesky decomposition[better source needed]) will, however, allow the viewer to see shared genetic effects (as opposed to the genetic correlation) by following path rules. It is important therefore to provide the unstandardised path coefficients in publications.

## References

1. ^
2. ^
3. ^ Neale & Maes 1996, Methodology for genetics studies of twins and families Archived 2017-03-27 at the Wayback Machine (6th ed.). Dordrecht, The Netherlands: Kluwer.
4. ^ a b pg 123 of Plomin 2012
5. ^ pp. 194–195 of Jensen 1980, Bias in Mental Testing
6. ^ Martin & Eaves 1977, "The Genetical Analysis of Covariance Structure" Archived 2016-10-25 at the Wayback Machine
7. ^ Eaves et al 1978, "Model-fitting approaches to the analysis of human behaviour"
8. ^ Loehlin & Vandenberg 1968, "Genetic and environmental components in the covariation of cognitive abilities: An additive model", in Progress in Human Behaviour Genetics, ed. S. G. Vandenberg, pp. 261–278. Johns Hopkins, Baltimore.
9. ^
10. ^
11. ^
12. ^ Cheverud 1988, "A comparison of genetic and phenotypic correlations"
13. ^ Krapohl et al 2015, "Phenome-wide analysis of genome-wide polygenic scores"
14. ^
15. ^ a b
16. ^ a b "LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis", Zheng et al 2016
17. ^ Sivakumaran et al 2011, "Abundant pleiotropy in human complex diseases and traits"
18. ^ a b Solovieff et al 2013, "Pleiotropy in complex traits: challenges and strategies"
19. ^ Cotsapas et al 2011, "Pervasive sharing of genetic effects in autoimmune disease"
20. ^
21. ^
22. ^ a b Canela-Xandri et al 2017, "An atlas of genetic associations in UK Biobank"
23. ^
24. ^
25. ^ Pickrell 2015, "Fulfilling the promise of Mendelian randomization"
26. ^
27. ^
28. ^ Hagenaars et al 2016b, "Cognitive ability and physical health: a Mendelian randomization study"
29. ^
30. ^ Verbanck et al 2017, "Widespread pleiotropy confounds causal relationships between complex traits and diseases inferred from Mendelian randomization"
31. ^ eg Falconer cites the example of chicken size and egg laying: chickens grown large for genetic reasons lay later, fewer, and larger eggs, while chickens grown large for environmental reasons lay quicker and more but normal sized eggs (pg315 of Falconer 1960); Falconer in Table 19.1 on pg316 also provides examples of opposite-signed phenotypic & genetic correlations: fleece-weight/length-of-wool & fleece weight/body-weight in sheep, and body-weight/egg-timing & body-weight/egg-production in chicken. One consequence of the negative chicken correlations was that, despite moderate heritabilities and a positive phenotypic correlation, selection had begun to fail to yield any improvements (pg329) according to "Genetic slippage in response to selection for multiple objectives", Dickerson 1955.
32. ^ Kruuk, Loeske E. B.; Slate, Jon; Pemberton, Josephine M.; Brotherstone, Sue; Guinness, Fiona; Clutton-Brock, Tim (2002). "Antler Size in Red Deer: Heritability and Selection but No Evolution". Evolution. 56 (8): 1683–95. doi:10.1111/j.0014-3820.2002.tb01480.x. PMID 12353761.
33. ^ Cheverud 1988, "A comparison of genetic and phenotypic correlations"
34. ^
35. ^
36. ^
37. ^
38. ^
39. ^ pg 397 of Plomin et al 2012
40. ^
41. ^ Munafo et al 2016, "Collider Scope: How selection bias can induce spurious associations"
42. ^
43. ^
44. ^ "The substantial comorbidity between specific cognitive disabilities is largely due to genetic factors, meaning that the same genes affect different learning disabilities although there are also disability-specific genes." pp. 184–185 of Plomin et al 2012
45. ^
46. ^
47. ^ Krapohl et al 2017, "Multi-polygenic score approach to trait prediction"
48. ^
49. ^ Turley et al 2017, "MTAG: Multi-Trait Analysis of GWAS"
50. ^
51. ^
52. ^
53. ^
54. ^
55. ^ Hazel & Lush 1943, "The efficiency of three methods of selection"
56. ^
57. ^ Falconer 1960, pp. 324–329
58. ^
59. ^ Beldade et al 2002, "Developmental constraints versus flexibility in morphological evolution"
60. ^
61. ^
62. ^ "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies", Bulik-Sullivan et al 2015 (see also Shi et al 2016); LDSC
63. ^
64. ^
65. ^
66. ^
67. ^
68. ^
69. ^ Posthuma & Boomsma 2000, "A note on the statistical power in extended twin designs"
70. ^ DeFries et al 1987, "Genetic Stability of Cognitive Development From Childhood to Adulthood"
71. ^
72. ^ pp. 317–318 of Falconer 1960
73. ^
74. ^
75. ^