Mendelian randomization

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In epidemiology, Mendelian randomization (commonly abbreviated to MR) is a method using measured variation in genes to interrogate the causal effect of an exposure on an outcome. Under key assumptions (see below), the design reduces both reverse causation and confounding, which often substantially imped or mislead the interpretation of results from epidemiological studies.[1]

The study design was first proposed in 1986[2] and subsequently described by Gray and Wheatley[3] as a method for obtaining unbiased estimates of the effects of a putative causal variable without conducting a traditional randomized controlled trial (i.e. the "gold standard" in epidemiology for establishing causality). These authors also coined the term Mendelian randomization. The description, application, interpretation and refinement of the Mendelian randomization method has since been pioneered by George Davey Smith, his group at the University of Bristol and world-wide collaborators.


One of the predominant aims of epidemiology is to identify modifiable causes of health outcomes and disease especially those of public health concern. In order to ascertain whether modifying a particular trait (e.g. via an intervention, treatment or policy change) will convey a beneficial effect within a population, firm evidence that this trait causes the outcome of interest is required. However, many observational epidemiological study designs are limited in the ability to discern correlation from causation - specifically whether a particular trait causes an outcome of interest, is simply related to that outcome (but does not cause it) or is a consequence of the outcome itself. Only the former will be beneficial within a public health setting where the aim is to modify that trait to reduce the burden of disease. There are many epidemiological study designs that aim to understand relationships between traits within a population sample, each with shared and unique advantages and limitations in terms of providing causal evidence, with the "gold standard" being randomized controlled trials.

Well-known successful demonstrations of causal evidence consistent across multiple studies with different designs include the identified causal links between smoking and lung cancer, and between blood pressure and stroke. However, there have also been notable failures when exposures hypothesized to be a causal risk factor for a particular outcome were later shown by well conducted randomized controlled trials not to be causal. For instance, it was previously thought that hormone replacement therapy would prevent cardiovascular disease, but it is now known to have no such benefit and may even adversely affect health.[4] Another notable example is that of selenium and prostate cancer. Some observational studies found an association between higher circulating selenium levels (usually acquired through various foods and dietary supplements ) and lower risk of prostate cancer. However, the Selenium and Vitamin E Cancer Prevention Trial (SELECT) showed evidence that dietary selenium supplementation actually increased the risk of prostate and advanced prostate cancer and had an additional off-target effect on increasing type 2 diabetes risk.[5]

Such inconsistencies between observational epidemiological studies and randomized controlled trials are likely a function of social, behavioural, or physiological confounding factors in many observational epidemiological designs, which are particularly difficult to measure accurately and difficult to control for. Moreover, randomized controlled trials are usually expensive, time consuming and laborious and many epidemiological findings cannot be ethically replicated in clinical trials.

Randomization approach[edit]

“Genetics is indeed in a peculiarly favoured condition in that Providence has shielded the geneticist from many of the difficulties of a reliably controlled comparison. The different genotypes possible from the same mating have been beautifully randomised by the meiotic process. A more perfect control of conditions is scarcely possible, than that of different genotypes appearing in the same litter.” — R.A. Fisher[6]

Mendelian randomization (MR) is a method that allows one to test for, or in certain cases to estimate, a causal effect from observational data in the presence of confounding factors. It uses common genetic polymorphisms with well-understood effects on exposure patterns (e.g., propensity to drink alcohol) or effects that mimic those produced by modifiable exposures (e.g., raised blood cholesterol[2]). Importantly, the genotype must only affect the disease status indirectly via its effect on the exposure of interest.[7]

Because genotypes are assigned randomly when passed from parents to offspring during meiosis, if we assume that mate choice is not associated with genotype (panmixia), then the population genotype distribution should be unrelated to the confounding factors that typically plague observational epidemiology studies. In this regard, Mendelian randomization can be thought of as a “naturally” randomized controlled trial. Because the polymorphism is the instrument, Mendelian randomization is dependent on prior genetic association studies having provided good candidate genes for response to risk exposure.[citation needed]

Statistical analysis[edit]

From a statistical perspective, Mendelian randomization (MR) is an application of the technique of instrumental variables[8][9] with genotype acting as an instrument for the exposure of interest. The method has also been used in economic research studying the effects of obesity on earnings, and other labor market outcomes.[10]

Accuracy of MR depends on a number of assumptions: That there is no direct relationship between the instrumental variable and the dependent variables, and that there are no direct relations between the instrumental variable and any possible confounding variables. In addition to being misled by direct effects of the instrument on the disease, the analyst may also be misled by linkage disequilibrium with unmeasured directly-causal variants, genetic heterogeneity, pleiotropy (often detected as a genetic correlation), or population stratification.[11] Mendelian randomization is widely used in analyzing data of the large-scale Genome-wide association study, which may adopt a case-control design. The conventional assumptions for instrumental variables under a case-control design are instead made in the population of controls.[12] Ignoring the ascertainment bias of a case-control study when performing a Mendelian randomization can lead to considerable bias in the estimation of causal effects.[12]


The basics of MR were invented by Martijn B. Katan in 1986, when he suggested the use of apolipoprotein E alleles, that had known effects on blood cholesterol levels, to study the causality between blood cholesterol and cancer.[2][13] However, MR is based on instrumental variables of econometrics, which were already invented in 1928 by Philip Green Wright and Sewall Wright.[14] The term "Mendelian randomization" was first used by Richard Gray and Keith Wheatley in 1991.[3][15] It comes from the name of Gregor Mendel and the fact that alleles are distributed randomly in people at fertilisation.[15] MR studies have become more common between 2007–2010 due to coincidental progress of omics-type of genetic research, which has provided lots of previously unknown connections between alleles and modifiable exposures.[16]


  1. ^ Smith GD, Ebrahim S (February 2003). "'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease?". International Journal of Epidemiology. 32 (1): 1–22. doi:10.1093/ije/dyg070. PMID 12689998.
  2. ^ a b c Katan MB (March 1986). "Apolipoprotein E isoforms, serum cholesterol, and cancer". Lancet. 1 (8479): 507–8. doi:10.1016/s0140-6736(86)92972-7. PMID 2869248. S2CID 38327985.
  3. ^ a b Gray R, Wheatley K (1991). "How to avoid bias when comparing bone marrow transplantation with chemotherapy". Bone Marrow Transplantation. 7 Suppl 3: 9–12. PMID 1855097.
  4. ^ Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, et al. (July 2002). "Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women's Health Initiative randomized controlled trial". Jama. 288 (3): 321–33. doi:10.1001/jama.288.3.321. PMID 12117397.
  5. ^ Klein EA, Thompson IM, Tangen CM, Crowley JJ, Lucia MS, Goodman PJ, et al. (October 2011). "Vitamin E and the risk of prostate cancer: the Selenium and Vitamin E Cancer Prevention Trial (SELECT)". Jama. 306 (14): 1549–56. doi:10.1001/jama.2011.1437. PMC 4169010. PMID 21990298.
  6. ^ Fisher RA (April 2010). "Statistical methods in genetics. 1951". International Journal of Epidemiology. 39 (2): 329–35. doi:10.1093/ije/dyp379. PMID 20176585.
  7. ^ Holmes MV, Ala-Korpela M, Smith GD (October 2017). "Mendelian randomization in cardiometabolic disease: challenges in evaluating causality". Nature Reviews. Cardiology. 14 (10): 577–590. doi:10.1038/nrcardio.2017.78. PMC 5600813. PMID 28569269.
  8. ^ Thomas DC, Conti DV (February 2004). "Commentary: the concept of 'Mendelian Randomization'". International Journal of Epidemiology. 33 (1): 21–5. doi:10.1093/ije/dyh048. PMID 15075141.
  9. ^ Didelez V, Sheehan N (August 2007). "Mendelian randomization as an instrumental variable approach to causal inference". Statistical Methods in Medical Research. 16 (4): 309–30. doi:10.1177/0962280206077743. PMID 17715159. S2CID 6236517.
  10. ^ Böckerman P, Cawley J, Viinikainen J, Lehtimäki T, Rovio S, Seppälä I, et al. (January 2019). "The effect of weight on labor market outcomes: An application of genetic instrumental variables". Health Economics. 28 (1): 65–77. doi:10.1002/hec.3828. PMC 6585973. PMID 30240095.
  11. ^ Smith GD, Ebrahim S (February 2003). "'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease?". International Journal of Epidemiology. 32 (1): 1–22. doi:10.1093/ije/dyg070. PMID 12689998.
  12. ^ a b Zhang H, Qin J, Berndt SI, Albanes D, Deng L, Gail MH, Yu K (June 2020). "On Mendelian randomization analysis of case-control study". Biometrics. 76 (2): 380–391. doi:10.1111/biom.13166. PMID 31625599.
  13. ^ Smith GD (September 2010). "Mendelian Randomization for Strengthening Causal Inference in Observational Studies: Application to Gene × Environment Interactions". Perspectives on Psychological Science. 5 (5): 527–45. doi:10.1177/1745691610383505. PMID 26162196. S2CID 13624460.
  14. ^ Benn M, Nordestgaard BG (July 2018). "From genome-wide association studies to Mendelian randomization: novel opportunities for understanding cardiovascular disease causality, pathogenesis, prevention, and treatment". Cardiovascular Research. 114 (9): 1192–1208. doi:10.1093/cvr/cvy045. PMID 29471399.
  15. ^ a b Davey Smith G (September 2007). "Capitalizing on Mendelian randomization to assess the effects of treatments". Journal of the Royal Society of Medicine. 100 (9): 432–5. doi:10.1177/014107680710000923. PMC 1963388. PMID 17766918.
  16. ^ Sekula P, Del Greco MF, Pattaro C, Köttgen A (November 2016). "Mendelian Randomization as an Approach to Assess Causality Using Observational Data". Journal of the American Society of Nephrology. 27 (11): 3253–3265. doi:10.1681/ASN.2016010098. PMC 5084898. PMID 27486138.

Further reading[edit]

External links[edit]