Statistical heterogeneity in systematic reviews of clinical trials: a critical appraisal of guidelines and practice

Abstract
Objective: Heterogeneity between study results can be a problem in any systematic review or meta-analysis of clinical trials. Identifying its presence, investigating its cause and correctly accounting for it in analyses all involve difficult decisions for the researcher. Our objectives were: to collate recommendations on the subject of dealing with heterogeneity in systematic reviews of clinical trials; to investigate current practice in addressing heterogeneity in Cochrane reviews; and to compare current practice with recommendations. Methods: We review guidelines for those undertaking systematic reviews and examine how heterogeneity is addressed in practice in a sample of systematic reviews, and their protocols, from the Cochrane Database of Systematic Reviews. Results: Advice to reviewers is on the whole consistent and sensible. However, examination of a sample of Cochrane protocols and reviews demonstrates that the advice is difficult to follow given the small numbers of studies identified in many systematic reviews, the difficulty of pre-specifying important effect modifiers for subgroup analysis or meta-regression and the unresolved debate concerning fixed versus random effects metaanalyses. There was disagreement between protocols and reviews, often either regarding choice of important potential effect modifiers or due to the review identifying too few studies to perform planned analyses. Conclusion: Guidelines that address practical issues are required to reduce the risk of spurious findings from investigations of heterogeneity. This may involve discouraging statistical investigations such as subgroup analyses and meta-regression, rather than simply adopting a cautious approach to their interpretation, unless a large number of studies is available. The notion of a priori specification of potential effect modifiers for a retrospective review of studies is ill-defined, and the appropriateness of using a statistical test for heterogeneity to decide between analysis strategies is suspect.