||This article provides insufficient context for those unfamiliar with the subject. (October 2009)|
In statistics, study heterogeneity is a problem that can arise when attempting to undertake a meta-analysis. Ideally, the studies whose results are being combined in the meta-analysis should all be undertaken in the same way and to the same experimental protocols: study heterogeneity is a term used to indicate that this ideal is not fully met.
Meta-analysis is a method used to combine the results of different trials in order to obtain a quantified synthesis. The size of individual clinical trials is often too small to detect treatment effects reliably. Meta-analysis increases the power of statistical analyses by pooling the results of all available trials.
As one tries to use the meta-analysis to estimate a combined effect from a group of similar studies, there needs to be a check that the effects found in the individual studies are similar enough that one can be confident that a combined estimate will be a meaningful description of the set of studies. However, the individual estimates of treatment effect will vary by chance; some variation is expected. The question is whether there is more variation than would be expected by chance alone. When this excessive variation occurs, it is called statistical heterogeneity, or just heterogeneity.
When there is heterogeneity that cannot readily be explained, one analytical approach is to incorporate it into a random effects model. A random effects meta-analysis model involves an assumption that the effects being estimated in the different studies are not identical, but follow some distribution. The model represents the lack of knowledge about why real, or apparent, treatment effects differ by treating the differences as if they were random. The centre of this symmetric distribution describes the average of the effects, while its width describes the degree of heterogeneity. The conventional choice of distribution is a normal distribution. It is difficult to establish the validity of any distributional assumption, and this is a common criticism of random effects meta-analyses. However, simulations have shown that methods are relatively robust even under extreme distributional assumptions, both in estimating heterogeneity, and calculating an overall effect size.
However, most meta-analyses include between 2-4 studies and such a sample is more often than not inadequate to accurately estimate heterogeneity. Thus it appears that in small meta-analyses, an incorrect zero between study variance estimate is obtained, leading to a false homogeneity assumption. Overall, it appears that heterogeneity is being consistently underestimated in meta-analyses.
One measure of heterogeneity is I2, a statistic that indicates the percentage of variance in a meta-analysis that is attributable to study heterogeneity. 
- Kontopantelis, E.; Springate, D. A.; Reeves, D. (2013). "A Re-Analysis of the Cochrane Library Data: The Dangers of Unobserved Heterogeneity in Meta-Analyses". In Friede, Tim. PLoS ONE 8 (7): e69930. doi:10.1371/journal.pone.0069930. PMC 3724681. PMID 23922860.
- Kontopantelis, E.; Reeves, D. (2010). "Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: A simulation study". Statistical Methods in Medical Research 21 (4): 409–426. doi:10.1177/0962280210392008. PMID 21148194.
- Higgins, J. P. T.; Thompson, S. G.; Deeks, J. J.; Altman, D. G. (2003). "Measuring inconsistency in meta-analyses". BMJ 327 (7414): 557–560. doi:10.1136/bmj.327.7414.557. PMC 192859. PMID 12958120.