Case-control study

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Case-Control Study vs. Cohort on a Timeline

A case-control study is a type of study design used widely, originally developed in epidemiology, although its use has also been advocated for the social sciences.[1][2] It is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. Case-control studies are often used to identify factors that may contribute to a medical condition by comparing subjects who have that condition/disease (the "cases") with patients who do not have the condition/disease but are otherwise similar (the "controls").[3] They require fewer resources but provide less evidence for causal inference than a randomized controlled trial.


The case-control is a type of epidemiological observational study. An observational study is a study in which subjects are not randomized to the exposed or unexposed groups, rather the subjects are observed in order to determine both their exposure and their outcome status and the exposure status is thus not determined by the researcher. Porta's Dictionary of Epidemiology[4] defines the case-control study as: an observational epidemiological study of persons with the disease (or another outcome variable) of interest and a suitable control group of persons without the disease (comparison group, reference group). The potential relationship of a suspected risk factor or an attribute to the disease is examined by comparing the diseased and nondiseased subjects with regard to how frequently the factor or attribute is present (or, if quantitative, the levels of the attribute) in each of the groups (diseased and nondiseased)."[4]

For example, in a study trying to show that people who smoke (the attribute) are more likely to be diagnosed with lung cancer (the outcome), the cases would be persons with lung cancer, the controls would be persons without lung cancer (not necessarily healthy), and some of each group would be smokers. If a larger proportion of the cases smoke than the controls, that suggests, but does not conclusively show, that the hypothesis is valid.

The case-control study is frequently contrasted with cohort studies, wherein exposed and unexposed subjects are observed until they develop an outcome of interest.[4][5]

Control group selection[edit]

Controls need not be in good health; inclusion of sick people is sometimes appropriate, as the control group should represent those at risk of becoming a case.[6] Controls should come from the same population as the cases, and their selection should be independent of the exposures of interest.[7]

Controls can carry the same disease as the experimental group, but of another grade/severity, therefore being different from the outcome of interest. However, because the difference between the cases and the controls will be smaller, this results in a lower power to detect an exposure effect.

As with any epidemiological study, greater numbers in the study will increase the power of the study. Numbers of cases and controls do not have to be equal. In many situations, it is much easier to recruit controls than to find cases. Increasing the number of controls above the number of cases, up to a ratio of about 4 to 1, may be a cost-effective way to improve the study.[8]

Strengths and weaknesses[edit]

Case-control studies are a relatively inexpensive and frequently used type of epidemiological study that can be carried out by small teams or individual researchers in single facilities in a way that more structured experimental studies often cannot be. They have pointed the way to a number of important discoveries and advances. The case-control study design is often used in the study of rare diseases or as a preliminary study where little is known about the association between the risk factor and disease of interest.[9]

Compared to prospective cohort studies they tend to be less costly and shorter in duration. In several situations they have greater statistical power than cohort studies, which must often wait for a 'sufficient' number of disease events to accrue.

Case-control studies are observational in nature and thus do not provide the same level of evidence as randomized controlled trials. The results may be confounded by other factors, to the extent of giving the opposite answer to better studies. A meta-analysis of what were considered 30 high-quality studies concluded that use of a product halved a risk, when in fact the risk was, if anything, increased.[10][11] It may also be more difficult to establish the timeline of exposure to disease outcome in the setting of a case-control study than within a prospective cohort study design where the exposure is ascertained prior to following the subjects over time in order to ascertain their outcome status. The most important drawback in case-control studies relates to the difficulty of obtaining reliable information about an individual’s exposure status over time. Case-control studies are therefore placed low in the hierarchy of evidence.


One of the most significant triumphs of the case-control study was the demonstration of the link between tobacco smoking and lung cancer, by Richard Doll and Bradford Hill. They showed a statistically significant association in a large case-control study.[12] Opponents argued for many years that this type of study cannot prove causation, but the eventual results of cohort studies confirmed the causal link which the case-control studies suggested,[13][14] and it is now accepted that tobacco smoking is the cause of about 87% of all lung cancer mortality in the US.


Case-control studies were initially analyzed by testing whether or not there were significant differences between the proportion of exposed subjects among cases and controls.[15] Subsequently Cornfield[16] pointed out that, when the disease outcome of interest is rare, the odds ratio of exposure can be used to estimate the relative risk (see rare disease assumption). It was later shown by Miettinen in 1976 that this assumption is not necessary and that the odds ratio of exposure can be used to directly estimate the incidence rate ratio of exposure without the need for the rare disease assumption.[15][17][18]

A very important challenge of modern statistics is the analysis of high-dimensional low sample size data with complex dependence structure that, thanks to technological advancements, are frequently seen in medicine and molecular biology. For example, when medical imaging is used, we might have to deal with high-dimensional case-control studies where the number of variables is comparable with, or even larger than, the number of subjects. In these situations, the Hotelling test, which is the most familiar test for multivariate case-control studies, performs poorly or cannot even be computed. Moreover, the central limit theorem cannot be applied because of small sample sizes that are common in practice. Among other scholars, Marozzi proposed tests for case-control studies that can be applied without strict assumptions, also to high-dimensional low sample size data with complex dependence structure, even when the number of variables is much larger than the number of subjects and the underlying population distributions are heavy-tailed or skewed.[19][20] In addition to magnetic resonance imaging data, these tests can be applied to data from other medical imaging techniques (like computed tomography or X–ray radiography), chemometrics and microarray data (proteomics, transcriptomics). In transcriptomics, hundreds or thousands of gene expression measures are used to study differences between two biological conditions. In this case, the null hypothesis is that there is no difference in expressions under comparison. Genes do not act alone in a biological system, their expression profiles are dependent and should be modelled as mutually dependent variables.

See also[edit]


  1. ^ Forgues, B. 2012. Sampling on the Dependent Variable Is Not Always That Bad: Quantitative Case-Control Designs for Strategic Organization Research. Strategic Organization, 10(3): 269-275.
  2. ^ Lacy, M. G. 1997. Efficiently Studying Rare Events: Case-Control Methods for Sociologists. Sociological Perspectives, 40(1): 129-154.
  3. ^ "8. Case-control and cross sectional studies". Retrieved March 5, 2012. 
  4. ^ a b c Porta M (editor). A dictionary of epidemiology. 5th. edition. New York: Oxford University Press, 2008. Edited by Miquel Porta [1]
  5. ^ Rothman K. Epidemiology. An Introduction. Oxford University Press, Oxford, England, 2002. [2]
  6. ^ Compared to what? Finding controls for case-control studies, David A Grimes, Kenneth F Schulz, Lancet 2005; 365: 1429–33
  7. ^ Case-control studies: research in reverse; Kenneth F Schulz, David A Grimes. Lancet 2002: 359: 431–34
  8. ^ Grimes D.A., Schulz K.F. Compared to what? Finding controls for case-control studies. Lancet 2005; 365: 1429–33
  9. ^ Levin, K. A. (2005). "Study design I". Evidence-Based Dentistry 6 (3): 78–79. doi:10.1038/sj.ebd.6400355. PMID 16184164.  edit
  10. ^ Lawlor, D. A., G. Davey Smith, and S. Ebrahim. 2004. Commentary: The hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology? Int. J. Epidemiol. 33: 464-467.
  11. ^ Ioannidis, J. P. A. 2005. Contradicted and initially stronger effects in highly cited clinical research. JAMA 294: 218-228.
  12. ^ Doll & Hill (1950). Smoking and carcinoma of the lung; preliminary report. BMJ, 2, 739.
  13. ^ Doll & Hill (1956). Lung cancer and other causes of death in relation to smoking; a second report on the mortality of British doctors. BMJ, 2, 1071.
  14. ^ Doll et al. (2004). Mortality in relation to smoking: 50 years’ observations on male British doctors. BMJ, 328, 1507.
  15. ^ a b Rodrigues L, Kirkwood BR. Case-control designs in the study of common diseases: updates on the demise of the rare disease assumption and the choice of sampling scheme for controls. International J of Epidemiology 1990; 19(1):205-13. [3]
  16. ^ Greenhouse SW. Jerome Cornfield’s contributions to epidemiology. Biometrics 1982; 38:33-45. [4]
  17. ^ Miettinen OS. Estimability and estimation in case-referent studies. Am J Epidemiol 1976; 103:226-236. [5]
  18. ^ Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd edition. Wolters Kluwer, Lippincott Williams & Wilkins. 2008. [6]
  19. ^ Marozzi, Marco (2014). "Multivariate tests based on interpoint distances with application to magnetic resonance imaging". Statistical Methods in Medical Research. doi:10.1177/0962280214529104. 
  20. ^ Marozzi, Marco (2015). "Multivariate multidistance tests for high-dimensional low sample size case-control studies". Statistics in Medicine. doi:10.1002/sim.6418. 

Further reading[edit]

  • Stolley, Paul D.; Schlesselman, James J. (1982). Case-control studies: design, conduct, analysis. Oxford [Oxfordshire]: Oxford University Press. ISBN 0-19-502933-X.  (Still a very useful book, and a great place to start, but now a bit out of date.)

External links[edit]