Premium
Genotype Imputation for A frican A mericans Using Data From H ap M ap Phase II Versus 1000 G enomes P rojects
Author(s) -
Sung Yun J.,
Gu C. Charles,
Tiwari Hemant K.,
Arnett Donna K.,
Broeckel Ulrich,
Rao Dabeeru C.
Publication year - 2012
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.21647
Subject(s) - international hapmap project , imputation (statistics) , single nucleotide polymorphism , linkage disequilibrium , 1000 genomes project , genome wide association study , genetics , genotype , biology , computational biology , statistics , missing data , mathematics , gene
Genotype imputation provides imputation of untyped single nucleotide polymorphisms ( SNP s) that are present on a reference panel such as those from the H ap M ap Project. It is popular for increasing statistical power and comparing results across studies using different platforms. Imputation for A frican A merican populations is challenging because their linkage disequilibrium blocks are shorter and also because no ideal reference panel is available due to admixture. In this paper, we evaluated three imputation strategies for A frican A mericans. The intersection strategy used a combined panel consisting of SNP s polymorphic in both CEU and YRI . The union strategy used a panel consisting of SNP s polymorphic in either CEU or YRI . The merge strategy merged results from two separate imputations, one using CEU and the other using YRI. Because recent investigators are increasingly using the data from the 1000 Genomes (1 KG ) Project for genotype imputation, we evaluated both 1 KG ‐based imputations and H ap M ap‐based imputations. We used 23,707 SNP s from chromosomes 21 and 22 on A ffymetrix SNP Array 6.0 genotyped for 1,075 H yper GEN A frican A mericans. We found that 1 KG‐based imputations provided a substantially larger number of variants than H ap M ap‐based imputations, about three times as many common variants and eight times as many rare and low‐frequency variants. This higher yield is expected because the 1 KG panel includes more SNP s. Accuracy rates using 1 KG data were slightly lower than those using H ap M ap data before filtering, but slightly higher after filtering. The union strategy provided the highest imputation yield with next highest accuracy. The intersection strategy provided the lowest imputation yield but the highest accuracy. The merge strategy provided the lowest imputation accuracy. We observed that SNP s polymorphic only in CEU had much lower accuracy, reducing the accuracy of the union strategy. Our findings suggest that 1 KG ‐based imputations can facilitate discovery of significant associations for SNP s across the whole MAF spectrum. Because the 1 KG Project is still under way, we expect that later versions will provide better imputation performance. Genet. Epidemiol. 36:508‐516, 2012. © 2012 Wiley Periodicals, Inc.