In statistics, study heterogeneity is a problem that can arise when attempting to undertake a meta-analysis. Ideally, the studies whose results are being combined in the meta-analysis should all be undertaken in the same way and to the same experimental protocols: study heterogeneity is a term used to indicate that this ideal is not fully met.
Meta-analysis is a method used to combine the results of different trials in order to obtain a quantified synthesis. The size of individual clinical trials is often too small to detect treatment effects reliably. Meta-analysis increases the power of statistical analyses by pooling the results of all available trials.
As one tries to use the meta-analysis to estimate a combined effect from a group of similar studies, there needs to be a check that the effects found in the individual studies are similar enough that one can be confident that a combined estimate will be a meaningful description of the set of studies. However, the individual estimates of treatment effect will vary by chance; some variation is expected. The question is whether there is more variation than would be expected by chance alone. When this excessive variation occurs, it is called statistical heterogeneity, or just heterogeneity.
When there is heterogeneity that cannot readily be explained, one analytical approach is to incorporate it into a random effects model. A random effects meta-analysis model involves an assumption that the effects being estimated in the different studies are not identical, but follow some distribution. The model represents the lack of knowledge about why real, or apparent, treatment effects differ by treating the differences as if they were random. The centre of this symmetric distribution describes the average of the effects, while its width describes the degree of heterogeneity. The conventional choice of distribution is a normal distribution. It is difficult to establish the validity of any distributional assumption, and this is a common criticism of random effects meta-analyses. However, simulations have shown that methods are relatively robust even under extreme distributional assumptions, both in estimating heterogeneity, and calculating an overall effect size.
However, most meta-analyses include between 2-4 studies and such a sample is more often than not inadequate to accurately estimate heterogeneity. Thus it appears that in small meta-analyses, an incorrect zero between study variance estimate is obtained, leading to a false homogeneity assumption. Overall, it appears that heterogeneity is being consistently underestimated in meta-analyses.
One measure of heterogeneity is I2, a statistic that indicates the percentage of variance in a meta-analysis that is attributable to study heterogeneity. When heterogeneity is substantial, a prediction interval rather than a confidence interval can help have a better sense of the uncertainty around the effect estimate.
The presence of heterogeneity may affect the statistical validity of the summary estimate of effect. Although this can be difficult to establish, the Vn statistic represents a direct measure of statistical validity.  As a statistic used to test the null hypothesis of statistical validity it has its limitations but this can be partly overcome by using it in conjunction with the Q statistic. 
- Kontopantelis, E.; Springate, D. A.; Reeves, D. (2013). Friede, Tim, ed. "A Re-Analysis of the Cochrane Library Data: The Dangers of Unobserved Heterogeneity in Meta-Analyses". PLoS ONE. 8 (7): e69930. doi:10.1371/journal.pone.0069930. PMC . PMID 23922860.
- Kontopantelis, E.; Reeves, D. (2012). "Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: A simulation study". Statistical Methods in Medical Research. 21 (4): 409–26. doi:10.1177/0962280210392008. PMID 21148194.
- Higgins, J. P. T.; Thompson, S. G.; Deeks, J. J.; Altman, D. G. (2003). "Measuring inconsistency in meta-analyses". BMJ. 327 (7414): 557–560. doi:10.1136/bmj.327.7414.557. PMC . PMID 12958120.
- Chiolero, A; Santschi, V; Burnand, B; Platt, RW; Paradis, G (Oct 2012). "Meta-analyses: with confidence or prediction intervals?". European journal of epidemiology. 27 (10): 823–5. doi:10.1007/s10654-012-9738-y. PMID 23070657.
- Willis BH, Riley RD (2017). "Measuring the statistical validity of summary meta-analysis and meta-regression results for use in clinical practice" (PDF). Statistics in Medicine. 36 (21): 3283–3301. doi:10.1002/sim.7372. PMID 28620945.
- Hoaglin DC (2016). "Misunderstandings about Q and 'Cochran's Q test' in meta-analysis". Statistics in Medicine. 35 (4): 485–95. doi:10.1002/sim.6632. PMID 26303773.