Premium
How large should the next study be? Predictive power and sample size requirements for replication studies
Author(s) -
Zwet Erik W.,
Goodman Steven N.
Publication year - 2022
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.9406
Subject(s) - replication (statistics) , sample size determination , statistical power , statistics , replicate , sign (mathematics) , statistical significance , value (mathematics) , sample (material) , power (physics) , computer science , mathematics , physics , mathematical analysis , quantum mechanics , thermodynamics
Abstract We use information derived from over 40K trials in the Cochrane Collaboration database of systematic reviews (CDSR) to compute the replication probability, or predictive power of an experiment given its observed (two‐sided)P $$ P $$ ‐value. We find that an exact replication of a marginally significant result withP = . 05 $$ P=.05 $$ has less than 30% chance of again reaching significance. Moreover, the replication of a result withP = . 005 $$ P=.005 $$ still has only 50% chance of significance. We also compute the probability that the direction (sign) of the estimated effect is correct, which is closely related to the type S error of Gelman and Tuerlinckx. We find that if an estimated effect hasP = . 05 $$ P=.05 $$ , there is a 93% probability that its sign is correct. IfP = . 005 $$ P=.005 $$ , then that probability is 99%. Finally, we compute the required sample size for a replication study to achieve some specified power conditional on thep $$ p $$ ‐value of the original study. We find that the replication of a result withP = . 05 $$ P=.05 $$ requires a sample size more than 16 times larger than the original study to achieve 80% power, whileP = . 005 $$ P=.005 $$ requires at least 3.5 times larger sample size. These findings confirm that failure to replicate the statistical significance of a trial does not necessarily indicate that the original result was a fluke.