Heritability

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Heritability asks how much genetics are playing a role in differences in height between people. This is not the same as asking how much genetics influences height in any one person.

Heritability is the proportion of observed differences on a trait among individuals of population that is due to genetic differences. Factors including genetics, environment and random chance can all contribute to the variation between individuals in their observable characteristics (in their "phenotypes").[1] Heritability thus analyzes the relative contributions of differences in genetic and non-genetic factors to the total phenotypic variance in a population. For instance, some humans in a population are taller than others; heritability attempts to identify how much genetics play a role in part of the population being extra tall.

Heritability is measured by estimating the relative contributions of genetic and non-genetic differences to the total phenotypic variation in a population. Heritability is an important concept in quantitative genetics, particularly in selective breeding and behavior genetics (for instance, twin studies), but is less widely used in population genetics.

Heritability measures the fraction of phenotype variability that can be attributed to genetic variation. This is not the same as saying that this fraction of an individual phenotype is caused by genetics. In addition, heritability can change without any genetic change occurring (e.g. when the environment starts contributing to more variation). A case in point, consider that both genes and environment have the potential to influence intelligence. Heritability could increase if genetic variation increases, causing individuals to show more phenotypic variation (e.g. to show different levels of intelligence). On the other hand, heritability might also increase if the environmental variation decreases, causing individuals to show less phenotypic variation (e.g. to show more similar levels of intelligence). Heritability is increasing because genetics are contributing more variation or because non-genetic factors are contributing less variation; what matters is the relative contribution. Here we see why heritability is specific to a particular population in a particular environment.

The extent of dependence of phenotype on environment can also be a function of the genes involved. Matters of heritability are complicated because genes may canalize a phenotype, making its expression almost inevitable in all occurring environments. Individuals with the same genotype can also exhibit different phenotypes through a mechanism called phenotypic plasticity, which makes heritability difficult to measure in some cases. Recent insights in molecular biology have identified changes in transcriptional activity of individual genes associated with environmental changes. However, there are a large number of genes whose transcription is not affected by the environment.[2]

Overview[edit]

An example of low heritability: a population with genotypes coding for only one hair colour
A crowd with variance in hair colour.

Estimates of heritability use statistical analyses to help to identify the causes of differences between individuals. Since heritability is concerned with variance, it is necessarily an account of the differences between individuals in a population. Heritability can be uni-variate – examining a single trait – or multivariate – examining the genetic and environmental associations between multiple traits at once. This allows a test of the genetic overlap between different phenotypes: for instance hair color and eye color. Environment and genetics may also interact, and heritability analyses can test for and examine these interactions (GxE models).

A prerequisite for heritability analyses is that there is some population variation to account for. In practice, all traits vary and almost all traits show some heritability.[3] For example, in a population with no diversity in hair color, "heritability" of hair color would be undefined. In populations with varying values of a trait (e.g. see the image on the left), variance could be due to environment (hair dye for instance) or genetic differences, and heritability could vary from 0-100%.

This last point highlights the fact that heritability cannot take into account the effect of factors which are invariant in the population. Factors may be invariant if they are absent and don't exist in the population (e.g. no one has access to a particular antibiotic), or because they are omni-present (e.g. if everyone is drinking coffee).

Definition[edit]

Any particular phenotype can be modeled as the sum of genetic and environmental effects:[4]

Phenotype (P) = Genotype (G) + Environment (E).

Likewise the variance in the trait – Var (P) – is the sum of genetic effects as follows:

Var(P) = Var(G) + Var(E) + 2 Cov(G,E).

In a planned experiment Cov(G,E) can be controlled and held at 0. In this case, heritability is defined as:

H^2 = \frac{\mathrm{Var}(G)}{\mathrm{Var}(P)} .

H2 is the broad-sense heritability. This reflects all the genetic contributions to a population's phenotypic variance including additive, dominant, and epistatic (multi-genic interactions), as well as maternal and paternal effects, where individuals are directly affected by their parents' phenotype (such as with milk production in mammals).

A particularly important component of the genetic variance is the additive variance, Var(A), which is the variance due to the average effects (additive effects) of the alleles. Since each parent passes a single allele per locus to each offspring, parent-offspring resemblance depends upon the average effect of single alleles. Additive variance represents, therefore, the genetic component of variance responsible for parent-offspring resemblance. The additive genetic portion of the phenotypic variance is known as Narrow-sense heritability and is defined as

h^2 = \frac{\mathrm{Var}(A)}{\mathrm{Var}(P)}

An upper case H2 is used to denote broad sense, and lower case h2 for narrow sense.

Additive variance is important for selection. If a selective pressure such as improving livestock is exerted, the response of the trait is directly related to narrow-sense heritability. The mean of the trait will increase in the next generation as a function of how much the mean of the selected parents differs from the mean of the population from which the selected parents were chosen. The observed response to selection leads to an estimate of the narrow-sense heritability (called realized heritability). This is the principle underlying artificial selection or breeding.

Example[edit]

Figure 1. Relationship of phenotypic values to additive and dominance effects using a completely dominant locus.

The simplest genetic model involves a single locus with two alleles (b and B) affecting one quantitative phenotype.

The number of B alleles can vary from 0, 1, or 2. For any genotype, BiBj, the expected phenotype can then be written as the sum of the overall mean, a linear effect, and a dominance deviation:

P_{ij} = \mu + \alpha_i + \alpha_j + d_{ij} = Population mean + Additive Effect (a_{ij}=\alpha_i+\alpha_j) + Dominance Deviation (d_{ij}).

The additive genetic variance at this locus is the weighted average of the squares of the additive effects:

\mathrm{Var}(A) = f(bb)a^2_{bb}+f(Bb)a^2_{Bb}+f(BB)a^2_{BB},

where f(bb)a_{bb}+f(Bb)a_{Bb}+f(BB)a_{BB} = 0.

There is a similar relationship for variance of dominance deviations:

\mathrm{Var}(D) = f(bb)d^2_{bb}+f(Bb)d^2_{Bb}+f(BB)d^2_{BB},

where f(bb)d_{bb}+f(Bb)d_{Bb}+f(BB)d_{BB} = 0.

The linear regression of phenotype on genotype is shown in Figure 1.

Estimating heritability[edit]

Since only P can be observed or measured directly, heritability must be estimated from the similarities observed in subjects varying in their level of genetic or environmental similarity. The statistical analyses required to estimate the genetic and environmental components of variance depend on the sample characteristics. Briefly, better estimates are obtained using data from individuals with widely varying levels of genetic relationship - such as twins, siblings, parents and offspring, rather than from more distantly related (and therefore less similar) subjects. The standard error for heritability estimates is improved with large sample sizes.

In non-human populations it is often possible to collect information in a controlled way. For example, among farm animals it is easy to arrange for a bull to produce offspring from a large number of cows and to control environments. Such experimental control is impossible when gathering human data, relying on naturally occurring relationships and environments.

Studies of human heritability often utilize adoption study designs, often with identical twins who have been separated early in life and raised in different environments (see for example Fig. 2). Such individuals have identical genotypes and can be used to separate the effects of genotype and environment. A limit of this design is the common prenatal environment and the relatively low numbers of twins reared apart. A second and more common design is the twin study in which the similarity of identical and fraternal twins is used to estimate heritability. These studies can be limited by the fact that identical twins are not completely genetically identical, potentially resulting in an underestimation of heritability. Studies of twins also examine differences between twins and non-twin siblings, for instance to examine phenomena such as intrauterine competition (for example, twin-to-twin transfusion syndrome).

Heritability estimates are always relative to the genetic and environmental factors in the population, and are not absolute measurements of the contribution of genetic and environmental factors to a phenotype. Heritability estimates reflect the amount of variation in genotypic effects compared to variation in environmental effects.

Heritability can be made larger by diversifying the genetic background, e.g., by using only very out bred individuals (which increases VarG) and/or by minimizing environmental effects (decreasing VarE). The converse also holds. Due to such effects, different populations of a species might have different heritabilities for the same trait.

In observational studies, or because of evokative effects (where a genome evokes environments by its effect on them), G and E may covary: gene environment correlation. Depending on the methods used to estimate heritability, correlations between genetic factors and shared or non-shared environments may or may not be confounded with heritability.[5]

Heritability estimates are often misinterpreted if it is not understood that they refer to the proportion of variation between individuals in a population that is influenced by genetic factors. Heritability describes the population, not individuals within that population. For example, It is incorrect to say that since the heritability of a personality trait is about .6, that means that 60% of your personality is inherited from your parents and 40% comes from the environment.

A highly heritable trait (such as eye color) assumes environmental inputs which (though they are invariant in most populations) are required for development: for instance temperatures and atmospheres supporting life, etc.). A more useful distinction than "nature vs. nurture" is "obligate vs. facultative"—under typical environmental ranges, what traits are more "obligate" (e.g., the nose—everyone has a nose) or more "facultative" (sensitive to environmental variations, such as specific language learned during infancy). Another useful distinction is between traits that are likely to be adaptations (such as the nose) vs. those that are byproducts of adaptations (such the white color of bones), or are due to random variation (non-adaptive variation in, say, nose shape or size).

Estimation methods[edit]

There are essentially two schools of thought regarding estimation of heritability.

One school of thought was developed by Sewall Wright at The University of Chicago, and further popularized by C. C. Li (University of Chicago) and J. L. Lush (Iowa State University). It is based on the analysis of correlations and, by extension, regression. Path Analysis was developed by Sewall Wright as a way of estimating heritability.

The second was originally developed by R. A. Fisher and expanded at The University of Edinburgh, Iowa State University, and North Carolina State University, as well as other schools. It is based on the analysis of variance of breeding studies, using the intraclass correlation of relatives. Various methods of estimating components of variance (and, hence, heritability) from ANOVA are used in these analyses.

Regression/correlation methods of estimation[edit]

The first school of estimation uses regression and correlation to estimate heritability.

Selection experiments[edit]

Figure 3. Strength of selection (S) and response to selection (R) in an artificial selection experiment, h2=R/S.

Calculating the strength of selection, S (the difference in mean trait between the population as a whole and the selected parents of the next generation, also called the selection differential[6]) and response to selection R (the difference in offspring and whole parental generation mean trait) in an artificial selection experiment will allow calculation of realized heritability as the response to selection relative to the strength of selection, h2=R/S as in Fig. 3.

Comparison of close relatives[edit]

In the comparison of relatives, we find that in general,

h^2 = \frac{b}{r} = \frac{t^2}{r} where r can be thought of as the coefficient of relatedness, b is the coefficient of regression and t the coefficient of correlation.

Parent-offspring regression[edit]

Figure 4. Sir Francis Galton's (1889) data showing the relationship between offspring height (928 individuals) as a function of mean parent height (205 sets of parents).

Heritability may be estimated by comparing parent and offspring traits (as in Fig. 4). The slope of the line (0.57) approximates the heritability of the trait when offspring values are regressed against the average trait in the parents. If only one parent's value is used then heritability is twice the slope. (note that this is the source of the term "regression", since the offspring values always tend to regress to the mean value for the population, i.e., the slope is always less than one). This regression effect also underlies the DeFries Fulker method for analyzing twins selected for one member being affected.[7]

Sibling comparison[edit]

A basic approach to heritability can be taken using full-Sib designs: comparing similarity between siblings who share both a biological mother and a father.[8] When there is only additive gene action, this sibling phenotypic correlation is an index of familiarity – the sum of half the additive genetic variance plus full effect of the common environment. It thus places an upper-limit on additive heritability of twice the full-Sib phenotypic correlation. Half-Sib designs compare phenotypic traits of siblings that share one parent with other sibling groups.

Twin studies[edit]

Main article: Twin study
Figure 5. Twin concordances for seven psychological traits (sample size shown inside bars), with DZ being fraternal and MZ being identical twins.

Heritability for traits in humans is most frequently estimated by comparing resemblances between twins (Fig. 2 & 5). "The advantage of twin studies, is that the total variance can be split up into genetic, shared or common environmental, and unique environmental components, enabling an accurate estimation of heritability".[9] Fraternal or dizygotic (DZ) twins on average share half their genes (assuming there is no assortative mating for the trait), and so identical or monozygotic (MZ) twins on average are twice as genetically similar as DZ twins. A crude estimate of heritability, then, is approximately twice the difference in correlation between MZ and DZ twins, i.e. Falconer's formula H2=2(r(MZ)-r(DZ)).

The effect of shared environment, c2, contributes to similarity between siblings due to the commonality of the environment they are raised in. Shared environment is approximated by the DZ correlation minus half heritability, which is the degree to which DZ twins share the same genes, c2=DZ-1/2h2. Unique environmental variance, e2, reflects the degree to which identical twins raised together are dissimilar, e2=1-r(MZ).

The methodology of the classical twin study has been criticized, but some of these criticisms do not take into account the methodological innovations and refinements described above.

Extended pedigree design[edit]

While often heritability is analyzed in single generations: comparing MZ twins raised apart, or comparing the similarity of MZ and DZ twins, considerable power can be gained using more complex relationships. By studying a trait in multi-generational families, the multiple recombination of genetic and environmental effects can be decomposed using software such as ASReml and heritability estimated.[10] This design is especially powerful for untangling confounds such as reverse causality, maternal effects such as the prenatal environment, and confounding of genetic dominance, shared environment, and maternal gene effects[11][12]

Analysis of variance methods of estimation[edit]

The second set of methods of estimation of heritability involves ANOVA and estimation of variance components.

Basic model[edit]

We use the basic discussion of Kempthorne (1957 [1969]). Considering only the most basic of genetic models, we can look at the quantitative contribution of a single locus with genotype Gi as

y_i = \mu + g_i + e

where

g_i is the effect of genotype Gi

and e is the environmental effect.

Consider an experiment with a group of sires and their progeny from random dams. Since the progeny get half of their genes from the father and half from their (random) mother, the progeny equation is

z_i = \mu + \frac{1}{2}g_i + e

Intraclass correlations[edit]

Consider the experiment above. We have two groups of progeny we can compare. The first is comparing the various progeny for an individual sire (called within sire group). The variance will include terms for genetic variance (since they did not all get the same genotype) and environmental variance. This is thought of as an error term.

The second group of progeny are comparisons of means of half sibs with each other (called among sire group). In addition to the error term as in the within sire groups, we have an addition term due to the differences among different means of half sibs. The intraclass correlation is

\mathrm{corr}(z,z') = \mathrm{corr}(\mu + \frac{1}{2}g + e, \mu + \frac{1}{2}g + e') = \frac{1}{4}V_g ,

since environmental effects are independent of each other.

The ANOVA[edit]

In an experiment with n sires and r progeny per sire, we can calculate the following ANOVA, using V_g as the genetic variance and V_e as the environmental variance:

Table 1: ANOVA for Sire experiment
Source d.f. Mean Square Expected Mean Square
Among sire groups n-1 S \frac{3}{4}V_g + V_e + r({\frac{1}{4}V_g})
Within sire groups n(r-1) W \frac{3}{4}V_g + V_e

The \frac{1}{4}V_g term is the intraclass correlation among half sibs. We can easily calculate H^2 = \frac{V_g}{V_g+V_e} = \frac{4(S-W)}{S+(r-1)W}. The Expected Mean Square is calculated from the relationship of the individuals (progeny within a sire are all half-sibs, for example), and an understanding of intraclass correlations.

Model with additive and dominance terms[edit]

For a model with additive and dominance terms, but not others, the equation for a single locus is

y_{ij} = \mu + \alpha_i + \alpha_j + d_{ij} + e,

where

\alpha_i is the additive effect of the ith allele, \alpha_j is the additive effect of the jth allele, d_{ij} is the dominance deviation for the ijth genotype, and e is the environment.

Experiments can be run with a similar setup to the one given in Table 1. Using different relationship groups, we can evaluate different intraclass correlations. Using V_a as the additive genetic variance and V_d as the dominance deviation variance, intraclass correlations become linear functions of these parameters. In general,

Intraclass correlation = r V_a + \theta V_d,

where r and \theta are found as

r = P[ alleles drawn at random from the relationship pair are identical by descent], and

\theta = P[ genotypes drawn at random from the relationship pair are identical by descent].

Some common relationships and their coefficients are given in Table 2.

Table 2: Coefficients for calculating variance components
Relationship r \theta
Identical Twins 1 1
Parent-Offspring \frac{1}{2} 0
Half Siblings \frac{1}{4} 0
Full Siblings \frac{1}{2} \frac{1}{4}
First Cousins \frac{1}{8} 0
Double First Cousins \frac{1}{4} \frac{1}{16}

Larger models[edit]

When a large, complex pedigree is available for estimating heritability, the most efficient use of the data is in a restricted maximum likelihood (REML) model. The raw data will usually have three or more data points for each individual: a code for the sire, a code for the dam and one or several trait values. Different trait values may be for different traits or for different time points of measurement.

The currently popular methodology relies on high degrees of certainty over the identities of the sire and dam; it is not common to treat the sire identity probabilistically. This is not usually a problem, since the methodology is rarely applied to wild populations (although it has been used for several wild ungulate and bird populations), and sires are invariably known with a very high degree of certainty in breeding programmes. There are also algorithms that account for uncertain paternity.

The pedigrees can be viewed using programs such as Pedigree Viewer [1], and analyzed with programs such as ASReml, VCE [2], WOMBAT [3] or BLUPF90 family's programs [4]

Response to selection[edit]

In selective breeding of plants and animals, the expected response to selection of a trait with known narrow-sense heritability h can be estimated using the breeder's equation:[13]

 R = h^2 S

In this equation, the Response to Selection (R) is defined as the realized average difference between the parent generation and the next generation, and the Selection Differential (S) is defined as the average difference between the parent generation and the selected parents.

For example, imagine that a plant breeder is involved in a selective breeding project with the aim of increasing the number of kernels per ear of corn. For the sake of argument, let us assume that the average ear of corn in the parent generation has 100 kernels. Let us also assume that the selected parents produce corn with an average of 120 kernels per ear. If h2 equals 0.5, then the next generation will produce corn with an average of 0.5(120-100) = 10 additional kernels per ear. Therefore, the total number of kernels per ear of corn will equal, on average, 110.

Note that heritability in the above equation is equal to the ratio \mathrm{Var}(A)/\mathrm{Var}(P) only if the genotype and the environmental noise follow Gaussian distributions .

Controversies[edit]

Some authors like Steven Rose[14] and Jay Joseph[15] have dismissed heritability estimates as useless. Criticisms have also been raised with regards to the lack of progress in identifying causal genetic variants for many traits (Missing heritability problem). A 2008 paper in Nature Reviews Genetics stated however: "Despite continuous misunderstandings and controversies over its use and application, heritability remains key to the response to selection in evolutionary biology and agriculture, and to the prediction of disease risk in medicine. Recent reports of substantial heritability for gene expression and new estimation methods using marker data highlight the relevance of heritability in the genomics era."[12] According to Eric Turkheimer, new molecular genetic methods of heritability estimation have demonstrated that the underlying assumptions of the classic twin design of human behavioral genetics are sound, "putting an end to an entire misguided school of thought among traditional opponents of classical quantitative (and by association behavioral) genetics."[16]

See also[edit]

References[edit]

Notes[edit]

  1. ^ Raj, A.; Van Oudenaarden, A. (2008). "Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences". Cell 135 (2): 216–226. doi:10.1016/j.cell.2008.09.050. PMC 3118044. PMID 18957198.  edit
  2. ^ Wills, C. (2007). "Principles of Population Genetics, 4th edition". Journal of Heredity (Book Review) 98 (4): 382–382. doi:10.1093/jhered/esm035.  edit
    • review of: Hartl, Daniel L.; Clark, Andrew G. (2007). Sunderland, MA: Sinauer and Associates. pp. xv + 652. ISBN 0-87893-308-5.  Missing or empty |title= (help)
  3. ^ Turkheimer, E. (2000). "Three laws of behavior genetics and what they mean". Current directions in psychological science 9 (5): 160–164. doi:10.1111/1467-8721.00084. 
  4. ^ Kempthorne 1957
  5. ^ Cattell RB (1960). "The multiple abstract variance analysis equations and solutions: for nature–nurture research on continuous variables". Psychol Rev 67 (6): 353–372. doi:10.1037/h0043487. PMID 13691636. 
  6. ^ Kempthorne 1957, p. 507; or Falconer & Mackay 1995, p. 191, for example.
  7. ^ Defries, J. C.; Fulker, D. W. (September 1985). "Multiple regression analysis of twin data". Behavior Genetics 15 (5): 467–473. doi:10.1007/BF01066239. PMID 4074272.  edit
  8. ^ Falconer, Douglas S.; Mackay, Trudy F. C. (December 1995). Introduction to Quantitative Genetics (4th ed.). Longman. ISBN 978-0582243026. 
  9. ^ Gielen, M., Lindsey, P.J., Derom, C., Smeets, H.J.M., Souren, N.Y., Paulussen, A.D.C., Derom, R., & Nijhuis, J.G. (2008) "Modeling Genetic and Environmental Factors to IncreaseHeritability and Ease the Identification of Candidate Genes for Birth Weight: A Twin Study". Behavioral Genetics. 38(44-54):45. DOI 10.1007/s10519-007-9170-3
  10. ^ Luciano, M.; Batty, G. D.; McGilchrist, M.; Linksted, P.; Fitzpatrick, B.; Jackson, C.; Pattie, A.; Dominiczak, A. F.; Morris, A. D.; Smith, B. H. (May–June 2010). "Shared genetic aetiology between cognitive ability and cardiovascular disease risk factors: Generation Scotland's Scottish family health study". Intelligence 38 (3): 304–313. doi:10.1016/j.intell.2010.03.002.  edit
  11. ^ Hill, W. G.; Goddard, M. E.; Visscher, P. M. (2008). "Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits". In MacKay, Trudy F. C. PLoS Genetics 4 (2): e1000008. doi:10.1371/journal.pgen.1000008. PMC 2265475. PMID 18454194.  edit
  12. ^ a b Visscher, P. M.; Hill, W. G.; Wray, N. R. (April 2008). "Heritability in the genomics era — concepts and misconceptions". Nature Reviews Genetics 9 (4): 255–266. doi:10.1038/nrg2322. PMID 18319743.  edit
  13. ^ Plomin, R., DeFries, J. C., & McClearn, G. E. (1990). Behavioral genetics. New York: Freeman.
  14. ^ Rose, S. P. R. (2006). "Commentary: Heritability estimates—long past their sell-by date". International Journal of Epidemiology 35 (3): 525–527. doi:10.1093/ije/dyl064. PMID 16645027.  edit
  15. ^ Joseph, J. (2004), The Gene Illusion, Algora. (2003 United Kingdom Edition by PCCS Books) (Chapter 5 contains a critique of the heritability concept)
  16. ^ Turkheimer, E. (2011). "Still Missing". Research in Human Development 8 (3–4): 227–241. doi:10.1080/15427609.2011.625321.  edit

Books[edit]

  • Falconer, D. S., and T. Mackay, 1996 Introduction to Quantitative Genetics, Ed. 4. Longman, Essex, England.
  • Kempthorne, O (1957 [1969]) An Introduction to Genetic Statistics. John Wiley. Reprinted, 1969 by Iowa State University Press.

Further reading[edit]

External links[edit]