z-logo
Premium
Comparison of four heterogeneity measures for meta‐analysis
Author(s) -
Lin Lifeng
Publication year - 2020
Publication title -
journal of evaluation in clinical practice
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.737
H-Index - 73
eISSN - 1365-2753
pISSN - 1356-1294
DOI - 10.1111/jep.13159
Subject(s) - statistics , study heterogeneity , meta analysis , pairwise comparison , statistic , econometrics , type i and type ii errors , statistical power , summary statistics , reliability (semiconductor) , mathematics , statistical hypothesis testing , computer science , power (physics) , medicine , confidence interval , physics , quantum mechanics
Rationale, aims, and objectives Heterogeneity is a critical issue in meta‐analysis, because it implies the appropriateness of combining the collected studies and impacts the reliability of the synthesized results. The Q test is a traditional method to assess heterogeneity; however, because it does not have an intuitive interpretation for clinicians and often has low statistical power, many meta‐analysts alter to use some measures, such as the I 2 statistic, to quantify the extent of heterogeneity. This article aims at providing a summary of available tools to assess heterogeneity and comparing their performance. Methods We reviewed four heterogeneity measures ( I 2 ,R ̂ I ,R ̂ M , andR ̂ b ) and illustrated how they could be treated as test statistics like the Q statistic. These measures were compared with respect to statistical power based on simulations driven by three real‐data examples. The pairwise agreement among the four measures was also evaluated using Cohen's κ coefficient. Results Generally,R ̂ I was slightly more powerful than the Q test, while its type I error rate might be slightly inflated. The power of I 2 was fairly close to that of Q . TheR ̂ M andR ̂ b statistics might have low powers in some cases. Because the differences between the powers of I 2 ,R ̂ I , and Q were often tiny, meta‐analysts might not expect I 2 andR ̂ I to yield significant heterogeneity if the Q test failed to do so. In addition, I 2 andR ̂ I had fairly good agreement based on the simulated meta‐analyses, but all other pairs of heterogeneity measures generally had poor agreement. Conclusion The I 2 andR ̂ I statistics are recommended for measuring heterogeneity. Meta‐analysts should use the heterogeneity measures as descriptive statistics which have intuitive interpretations from the clinical perspective, instead of determining the significance of heterogeneity simply based on their magnitudes.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here