Premium
Resetting the bar: Statistical significance in whole‐genome sequencing‐based association studies of global populations
Author(s) -
Pulit Sara L.,
With Sera A. J.,
Bakker Paul I. W.
Publication year - 2017
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.22032
Subject(s) - genome wide association study , genetic association , biology , genotyping , genetics , whole genome sequencing , computational biology , genome , dna sequencing , sample size determination , false discovery rate , genotype , evolutionary biology , statistics , single nucleotide polymorphism , gene , mathematics
ABSTRACT Genome‐wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype‐phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping‐based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome‐wide significance thresholds for various analysis scenarios. Using whole‐genome sequence data, we simulated sequencing‐based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome‐wide significance at approximately P = 5 × 10 −9 , and studies of African samples should apply a more stringent genome‐wide significance threshold of P = 1 × 10 −9 . Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.