Premium
Marbled Inflation From Population Structure in Gene‐Based Association Studies With Rare Variants
Author(s) -
Liu Qianying,
Nicolae Dan L.,
Chen Lin S.
Publication year - 2013
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.21714
Subject(s) - population stratification , biology , single nucleotide polymorphism , genetics , genetic association , population , principal component analysis , inference , allele frequency , snp , confounding , genome wide association study , computational biology , gene , genotype , statistics , computer science , mathematics , artificial intelligence , medicine , environmental health
Accurate genetic association studies are crucial for the detection and the validation of disease determinants. One of the main confounding factors that affect accuracy is population stratification, and great efforts have been extended for the past decade to detect and to adjust for it. We have now efficient solutions for population stratification adjustment for single‐SNP (where SNP is single‐nucleotide polymorphisms) inference in genome‐wide association studies, but it is unclear whether these solutions can be effectively applied to rare variation studies and in particular gene‐based (or set‐based) association methods that jointly analyze multiple rare and common variants. We examine here, both theoretically and empirically, the performance of two commonly used approaches for population stratification adjustment—genomic control and principal component analysis—when used on gene‐based association tests. We show that, different from single‐SNP inference, genes with diverse composition of rare and common variants may suffer from population stratification to various extent. The inflation in gene‐level statistics could be impacted by the number and the allele frequency spectrum of SNPs in the gene, and by the gene‐based testing method used in the analysis. As a consequence, using a universal inflation factor as a genomic control should be avoided in gene‐based inference with sequencing data. We also demonstrate that caution needs to be exercised when using principal component adjustment because the accuracy of the adjusted analyses depends on the underlying population substructure, on the way the principal components are constructed, and on the number of principal components used to recover the substructure.