
Exact replication: Foundation of science or game of chance?
Author(s) -
Sophie K. Piper,
Ulrike Grittner,
André Rex,
Nico Riedel,
Felix Fischer,
Robert Nadon,
Bob Siegerink,
Ulrich Dirnagl
Publication year - 2019
Publication title -
plos biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 4.127
H-Index - 271
eISSN - 1545-7885
pISSN - 1544-9173
DOI - 10.1371/journal.pbio.3000188
Subject(s) - replication (statistics) , sample size determination , biology , statistical power , statistical hypothesis testing , sample (material) , computer science , meta analysis , statistics , confidence interval , bayesian probability , artificial intelligence , mathematics , physics , medicine , thermodynamics
The need for replication of initial results has been rediscovered only recently in many fields of research. In preclinical biomedical research, it is common practice to conduct exact replications with the same sample sizes as those used in the initial experiments. Such replication attempts, however, have lower probability of replication than is generally appreciated. Indeed, in the common scenario of an effect just reaching statistical significance, the statistical power of the replication experiment assuming the same effect size is approximately 50%—in essence, a coin toss. Accordingly, we use the provocative analogy of “replicating” a neuroprotective drug animal study with a coin flip to highlight the need for larger sample sizes in replication experiments. Additionally, we provide detailed background for the probability of obtaining a significant p value in a replication experiment and discuss the variability of p values as well as pitfalls of simple binary significance testing in both initial preclinical experiments and replication studies with small sample sizes. We conclude that power analysis for determining the sample size for a replication study is obligatory within the currently dominant hypothesis testing framework. Moreover, publications should include effect size point estimates and corresponding measures of precision, e.g., confidence intervals, to allow readers to assess the magnitude and direction of reported effects and to potentially combine the results of initial and replication study later through Bayesian or meta-analytic approaches.