Causal inference

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed.[1][2]

In epidemiology, when an association between an exposure (a putative risk factor) and a disease is found, causality is often uncertain. Bradford Hill criteria [3] are often used to assess causality, although the criteria are not solid exclusive ways to assess causality. A recent trend is to identify evidence for influence of the exposure on molecular pathology within diseased tissue or cells, in the emerging interdisciplinary field of molecular pathological epidemiology (MPE).[4] Linking the exposure to molecular pathologic signatures of the disease can help to assess causality. Considering the inherent nature of heterogeneity of a given disease (the unique disease principle[5][6]), disease phenotyping and subtyping are surging trends in biomedical and public health sciences,[7][8][9][10][11] well exemplified as personalized medicine and precision medicine.

Common frameworks for causal inference are structural equation modeling and the Rubin causal model.

In computer science, determination of cause and effect from joint observational data for two time-independent variables, say X and Y, has been tackled using asymmetry between evidence for some model in the directions, X → Y and Y → X. One idea is to incorporate an independent noise term in the model to compare the evidences of the two directions.

Here are some of the noise models for the hypothesis Y → X with the noise E:

  • Additive Noise:[12]Y = F(X)+E
  • Linear Noise:[13]Y = pX + qE
  • Post Non Linear:[14]Y = G(F(X)+E)
  • Heteroskedastic Noise Y = F(X)+E.G(X)
  • Functional Noise:[15] Y = F(X,E)

The common assumption in these models are:

  • There are no other causes of Y.
  • X and E have no common causes.
  • Distribution of cause is independent from causal mechanisms.

On an intuitive level, the idea is that the factorization of the joint distribution P(Cause,Effect) into P(Cause)*P(Effect | Cause) typically yields models of lower total complexity than the factorization into P(Effect)*P(Cause | Effect). Although the notion of “complexity” is intuitively appealing, it is not obvious how it should be precisely defined.[15]

See also[edit]


  1. ^ Pearl, Judea (1 January 2009). "Causal inference in statistics: An overview". Statistics Surveys 3: 96–146. doi:10.1214/09-SS057. 
  2. ^ Morgan, Stephen; Winship, Chris (2007). Counterfactuals and Causal inference. Cambridge University Press. ISBN 978-0-521-67193-4. 
  3. ^ Hill, Austin Bradford (1965). "The Environment and Disease: Association or Causation?". Proceedings of the Royal Society of Medicine 58 (5): 295–300. PMC 1898525. PMID 14283879. 
  4. ^ Ogino S, Stampfer M (2010). "Lifestyle factors and microsatellite instability in colorectal cancer: the evolving field of molecular pathological epidemiology". J. Natl. Cancer Inst. 102 (6): 365–7. doi:10.1093/jnci/djq031. PMC 2841039. PMID 20208016. 
  5. ^ Ogino S, Lochhead P, Chan AT, Nishihara R, Cho E, Wolpin BM, Meyerhardt AJ, Meissner A, Schernhammer ES, Fuchs CS, Giovannucci E. Molecular pathological epidemiology of epigenetics: emerging integrative science to analyze environment, host, and disease. Mod Pathol 2013;26:465-484.
  6. ^ Ogino S, Fuchs CS, Giovannucci E. How many molecular subtypes? Implications of the unique tumor principle in personalized medicine. Expert Rev Mol Diagn 2012; 12: 621-628.
  7. ^ Begg CB. A strategy for distinguishing optimal cancer subtypes. Int J Cancer 2011; 129: 931-937.
  8. ^ Galon J, et al. Cancer classification using the Immunoscore: a worldwide task force. Journal of Translational Medicine 2012; 10: 205.
  9. ^ Spitz MR, Caporaso NE, Sellers TA. Integrative cancer epidemiology--the next generation. Cancer Discov 2012; 2: 1087-90.
  10. ^ Field AE, Camargo Jr CA, Ogino S. The merits of subtyping obesity: one size does not fit all. JAMA 2013;310:2147-2148.
  11. ^ Zaidi N, Lupien L, Kuemmerle NB, Kinlaw WB, Swinnen JV, Smans K. Lipogenesis and lipolysis: The pathways exploited by the cancer cells to acquire fatty acids. Prog Lipid Res 2013; 52: 585-9.
  12. ^ Hoyer, Patrik O., et al. "Nonlinear causal discovery with additive noise models." NIPS. Vol. 21. 2008.
  13. ^ Shimizu, Shohei, et al. "DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model." The Journal of Machine Learning Research 12 (2011): 1225-1248.
  14. ^ Zhang, Kun, and Aapo Hyvärinen. "On the identifiability of the post-nonlinear causal model." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009.
  15. ^ a b Mooij, Joris M., et al. "Probabilistic latent variable models for distinguishing between cause and effect." NIPS. 2010.