The more you test, the more you find: The smallest  P ‐values become increasingly enriched with real findings as more tests are conducted | Zendy

Vsevolozhskaya Olga A. | Zendy; Kuo ChiaLing | Zendy; Ruiz Gabriel | Zendy; Diatchenko Luda | Zendy; Zaykin Dmitri V. | Zendy

Premium

The more you test, the more you find: The smallest P ‐values become increasingly enriched with real findings as more tests are conducted

Author(s) -

Vsevolozhskaya Olga A.,

Kuo ChiaLing,

Ruiz Gabriel,

Diatchenko Luda,

Zaykin Dmitri V.

Publication year - 2017

Publication title -

genetic epidemiology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.301

H-Index - 98

eISSN - 1098-2272

pISSN - 0741-0395

DOI - 10.1002/gepi.22064

Subject(s) - statistical hypothesis testing , multiple comparisons problem , statistical power , replication (statistics) , false discovery rate , statistics , sample size determination , genome wide association study , statistical model , statistical significance , biology , genetics , econometrics , mathematics , gene , single nucleotide polymorphism , genotype

The increasing accessibility of data to researchers makes it possible to conduct massive amounts of statistical testing. Rather than follow specific scientific hypotheses with statistical analysis, researchers can now test many possible relationships and let statistics generate hypotheses for them. The field of genetic epidemiology is an illustrative case, where testing of candidate genetic variants for association with an outcome has been replaced by agnostic screening of the entire genome. Poor replication rates of candidate gene studies have improved dramatically with the increase in genomic coverage, due to factors such as adoption of better statistical practices and availability of larger sample sizes. Here, we suggest that another important factor behind the improved replicability of genome‐wide scans is an increase in the amount of statistical testing itself. We show that an increase in the number of tested hypotheses increases the proportion of true associations among the variants with the smallest P ‐values. We develop statistical theory to quantify how the expected proportion of genuine signals (EPGS) among top hits depends on the number of tests. This enrichment of top hits by real findings holds regardless of whether genome‐wide statistical significance has been reached in a study. Moreover, if we consider only those “failed” studies that produce no statistically significant results, the same enrichment phenomenon takes place: the proportion of true associations among top hits grows with the number of tests. The enrichment occurs even if the true signals are encountered at the logarithmically decreasing rate with the additional testing.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research