Missing heritability problem
The "missing heritability" problem can be defined as the fact that single genetic variations cannot account for much of the heritability of diseases, behaviors, and other phenotypes. This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on "the combined effect of all the genes in the background than on the disease genes in the foreground", or the role of genes may have been severely overestimated.
The 'missing heritability' problem was named as such in 2008 (after the "missing baryon problem" in physics). The Human Genome Project led to optimistic forecasts that the large genetic contributions to many traits and diseases (which were identified by quantitative genetics and behavioral genetics in particular) would soon be mapped and pinned down to specific genes and their genetic variants by methods such as candidate-gene studies which used small samples with limited genetic sequencing to focus on specific genes believed to be involved, examining the SNP kinds of variants. While many hits were found, they often failed to replicate in other studies.
The exponential fall in genome sequencing costs led to the use of GWAS studies which could simultaneously examine all candidate-genes in larger samples than the original finding, where the candidate-gene hits were found to almost always be false positives and only 2-6% replicate; in the specific case of intelligence candidate-gene hits, only 1 candidate-gene hit replicated,, the top 25 schizophrenia candidate-genes were no more associated with schizophrenia than chance, and of 15 neuroimaging hits, none did. The editorial board of Behavior Genetics noted, in setting more stringent requirements for candidate-gene publications, that "the literature on candidate gene associations is full of reports that have not stood up to rigorous replication...it now seems likely that many of the published findings of the last decade are wrong or misleading and have not contributed to real advances in knowledge". Other researchers have characterized the literature as having "yielded an infinitude of publications with very few consistent replications" and called for a phase out of candidate-gene studies in favor of polygenic scores.
This led to a dilemma. Standard genetics methods have long estimated large heritabilities such as 80% for traits such as height or intelligence, yet none of the genes had been found despite sample sizes that, while small, should have been able to detect variants of reasonable effect size such as 1 inch or 5 IQ points. If genes have such strong cumulative effects - where were they? Several resolutions have been proposed, that the missing heritability is some combination of:
- Twin studies and other methods were grossly biased by issues long raised by their critics; there was little genetic influence to be found
- Genetic effects are actually epigenetics
- Genetic effects are generally non-additive and due to complex interactions. Among many proposals, a model has been introduced that takes into account epigenetic inheritance on the risk and recurrence risk of a complex disease. The limiting pathway (LP) model has been introduced in which a trait depends on the value of k inputs that can have rate limitations due to stoichiometric ratios, reactants required in a biochemical pathway, or proteins required for transcription of a gene. Each of these k inputs is a strictly additive trait that depends on a set of common or rare variants. When k = 1, the LP model is simply a standard additive trait.
- Genetic effects are not due to the common SNPs examined in the candidate-gene studies & GWASes, but due to very rare mutations, copy-number variations, and other exotic kinds of genetic variants. These variants tend to be harmful and kept at low frequencies by natural selection. Whole-genome sequencing would be required to track down specific rare variants.
- Traits are all misdiagnoses: one person's 'schizophrenia' is due to entirely different causes than another schizophrenic, and so while there may be a gene involved in 1 case, it will not be involved in another, rendering GWASes futile
- Traits are genuine but inconsistently diagnosed or genetically influenced from country to country and time to time, leading to measurement error, which combined with genetic heterogeneity, either due to race or environment, will bias meta-analyzed GWAS & GCTA results towards zero,
- Genetic effects are indeed through common SNPs acting additively, but are highly polygenic: dispersed over hundreds or thousands of variants each of small effect like a fraction of an inch or a fifth of an IQ point and with low prior probability: unexpected enough that a candidate-gene study is unlikely to select the right SNP out of hundreds of thousands of known SNPs, and GWASes up to 2010, with n<20000, would be unable to find hits which reach genome-wide statistical-significance thresholds. Much larger GWAS sample sizes, often n>100k, would be required to find any hits at all, and would steadily increase after that.
- This resolution to the missing heritability problem was supported by the introduction of Genome-wide complex trait analysis (GCTA) in 2010, which demonstrated that trait similarity could be predicted by the genetic similarity of unrelated strangers on common SNPs treated additively, and for many traits the SNP heritability was indeed a substantial fraction of the overall heritability. The GCTA results were further buttressed by findings that a small percent of trait variance could be predicted in GWASes without any genome-wide statistically-significant hits by a linear model including all SNPs regardless of p-value; if there were no SNP contribution, this would be unlikely, but it would be what one expected from SNPs whose effects were very imprecisely estimated by a too-small sample. Combined with the upper bound on maximum effect sizes set by the GWASes up to then, this strongly implied that the highly polygenic theory was correct. Examples of complex traits where increasingly large-scale GWASes have yielded the initial hits and then increasing numbers of hits as sample sizes increased from n<20k to n>100k or n>300k include height, intelligence, and schizophrenia.
- Manolio, T. A.; Collins, F. S.; Cox, N. J.; Goldstein, D. B.; Hindorff, L. A.; Hunter, D. J.; McCarthy, M. I.; Ramos, E. M.; Cardon, L. R.; Chakravarti, A.; Cho, J. H.; Guttmacher, A. E.; Kong, A.; Kruglyak, L.; Mardis, E.; Rotimi, C. N.; Slatkin, M.; Valle, D.; Whittemore, A. S.; Boehnke, M.; Clark, A. G.; Eichler, E. E.; Gibson, G.; Haines, J. L.; MacKay, T. F. C.; McCarroll, S. A.; Visscher, P. M. (2009). "Finding the missing heritability of complex diseases". Nature. 461 (7265): 747–753. PMC . PMID 19812666. doi:10.1038/nature08494.
- Zuk, O.; Hechter, E.; Sunyaev, S. R.; Lander, E. S. (2012). "The mystery of missing heritability: Genetic interactions create phantom heritability". Proceedings of the National Academy of Sciences. 109 (4): 1193–1198. PMC . PMID 22223662. doi:10.1073/pnas.1119675109.
- Lee, S. H.; Wray, N. R.; Goddard, M. E.; Visscher, P. M. (2011). "Estimating Missing Heritability for Disease from Genome-wide Association Studies". American Journal of Human Genetics. 88 (3): 294–305. doi:10.1016/j.ajhg.2011.02.002.
- Slatkin, M. (2009). "Epigenetic Inheritance and the Missing Heritability Problem". Genetics. 182 (3): 845–850. PMC . PMID 19416939. doi:10.1534/genetics.109.102798.
- Eichler, E. E.; Flint, J.; Gibson, G.; Kong, A.; Leal, S. M.; Moore, J. H.; Nadeau, J. H. (2010). "Missing heritability and strategies for finding the underlying causes of complex disease". Nature Reviews Genetics. 11 (6): 446–450. PMC . PMID 20479774. doi:10.1038/nrg2809.
- "Personal genomes: The case of the missing heritability", Maher 2008
- "Replication Validity of Initial Association Studies: A Comparison between Psychiatry, Neurology and Four Somatic Diseases", Dumas-Mallet et al 2016
- "The False-positive to False-negative Ratio in Epidemiologic Studies", Ioannidis et al 2011
- "A Test-Replicate Approach to Candidate Gene Research on Addiction and Externalizing Disorders: A Collaboration Across Five Longitudinal Studies", Samek et al 2016
- Bevan et al 2012, "Genetic heritability of ischemic stroke and the contribution of previously reported candidate gene and genome-wide associations"
- Siontis et al 2010, "Replication of past candidate loci for common diseases and phenotypes in 100 genome-wide association studies"
- Duncan & Keller 2011, "A Critical Review of the First 10 Years of Candidate Gene-by-Environment Interaction Research in Psychiatry"
- Chabris, CF; Hebert, BM; Benjamin, DJ; Beauchamp, J; Cesarini, D; van der Loos, M; Johannesson, M; Magnusson, PK; Lichtenstein, P; Atwood, CS; Freese, J; Hauser, TS; Hauser, RM; Christakis, N; Laibson, D (2012). "Most reported genetic associations with general intelligence are probably false positives". Psychol Sci. 23: 1314–23. PMC . PMID 23012269. doi:10.1177/0956797611435528.
- Johnson et al 2017, "No evidence that schizophrenia candidate genes are more associated with schizophrenia than non-candidate genes"
- Jahanshad et al 2017, "Do Candidate Genes Affect the Brain's White Matter Microstructure? Large-Scale Evaluation of 6,165 Diffusion MRI Scans"
- Hewitt 2012, "Editorial Policy on Candidate Gene Association and Candidate Gene-by-Environment Interaction Studies of Complex Traits"
- Arango 2017, "Candidate gene associations studies in psychiatry: time to move forward"
- "Meta-GWAS Accuracy and Power (MetaGAP) calculator shows that hiding heritability is partially due to imperfect genetic correlations across studies", de Vlaming et al 2016: MetaGAP
- Wray & Maier 2014, "Genetic basis of complex genetic disease: the contribution of disease heterogeneity to missing heritability"
- Wray et al 2012, "Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes"
- Lee et al 2013a, "Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs"
- Lee et al 2013b, "General framework for meta-analysis of rare variants in sequencing association studies"
- Sham & Purcell 2014, "Statistical power and significance testing in large-scale genetic studies"
- "Defining the role of common variation in the genomic and biological architecture of adult human height", Wood et al 2014
- Chabris et al 2012 reported only 1 possible hit using a few thousand; "GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment", Rietveld et al 2013 with n=100k reported 3 hits; "Genome-wide association study identifies 74 loci associated with educational attainment", Okbay et al 2016 reported 74 hits using n=293k and ~160 when extended to n=404k