
False discovery rate control in genome-wide association studies with population structure
Author(s) -
Matteo Sesia,
Stephen Bates,
Emmanuel J. Candès,
Jonathan Marchini,
Chiara Sabatti
Publication year - 2021
Publication title -
proceedings of the national academy of sciences of the united states of america
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 5.011
H-Index - 771
eISSN - 1091-6490
pISSN - 0027-8424
DOI - 10.1073/pnas.2105841118
Subject(s) - spurious relationship , linkage disequilibrium , univariate , population stratification , genetic association , false discovery rate , population , biobank , genome wide association study , linkage (software) , computer science , computational biology , multiple comparisons problem , biology , data mining , machine learning , genetics , multivariate statistics , statistics , haplotype , genotype , mathematics , single nucleotide polymorphism , medicine , environmental health , gene
Significance Genome-wide association studies compare a phenotype to thousands of genetic variants, searching for associations of potential biological interest. Standard analyses rely on linear models of the phenotype given one variable at a time. However, their assumptions are difficult to verify and their univariate approaches make it hard to recognize interesting associations from spurious ones. Our work takes a different path: We analyze all variants simultaneously, modelling the randomness in the genotypes, which is better understood, instead of the phenotype. Our solution accounts for linkage disequilibrium and population structure, controls the false discovery rate, and leverages powerful machine-learning tools. Applications to the UK Biobank data indicate increased power compared to state-of-the-art alternatives and high replicability.