Premium
P > .05: The incorrect interpretation of “not significant” results is a significant problem
Author(s) -
Smith Richard J.
Publication year - 2020
Publication title -
american journal of physical anthropology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.146
H-Index - 119
eISSN - 1096-8644
pISSN - 0002-9483
DOI - 10.1002/ajpa.24092
Subject(s) - null hypothesis , statistical hypothesis testing , statistics , statistical inference , inference , confidence interval , statistical significance , alternative hypothesis , statistical power , p value , multiple comparisons problem
Abstract Statistically nonsignificant ( p > .05) results from a null hypothesis significance test (NHST) are often mistakenly interpreted as evidence that the null hypothesis is true—that there is “no effect” or “no difference.” However, many of these results occur because the study had low statistical power to detect an effect. Power below 50% is common, in which case a result of no statistical significance is more likely to be incorrect than correct. The inference of “no effect” is not valid even if power is high. NHST assumes that the null hypothesis is true; p is the probability of the data under the assumption that there is no effect . A statistical test cannot confirm what it assumes. These incorrect statistical inferences could be eliminated if decisions based on p values were replaced by a biological evaluation of effect sizes and their confidence intervals. For a single study, the observed effect size is the best estimate of the population effect size, regardless of the p value. Unlike p values, confidence intervals provide information about the precision of the observed effect. In the biomedical and pharmacology literature, methods have been developed to evaluate whether effects are “equivalent,” rather than zero, as tested with NHST. These methods could be used by biological anthropologists to evaluate the presence or absence of meaningful biological effects. Most of what appears to be known about no difference or no effect between sexes, between populations, between treatments, and other circumstances in the biological anthropology literature is based on invalid statistical inference.