z-logo
open-access-imgOpen Access
Comparison of Multiple Imputation Algorithms and Verification Using Whole-Genome Sequencing in the CMUH Genetic Biobank
Author(s) -
Ting-Yuan Liu,
Lin Chen,
Hsing-Tsung Wu,
Yalun Wu,
YuChia Chen,
Chi-Chou Liao,
YuPao Chou,
Dysan Chao,
HsingFang Lu,
YaSian Chang,
JanGowth Chang,
KaiCheng Hsu,
FuuJen Tsai
Publication year - 2021
Publication title -
biomedicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.498
H-Index - 26
eISSN - 2211-8039
pISSN - 2211-8020
DOI - 10.37796/2211-8039.1302
Subject(s) - imputation (statistics) , biobank , genome wide association study , data mining , computer science , single nucleotide polymorphism , genetic association , missing data , machine learning , bioinformatics , genetics , biology , genotype , gene
A genome-wide association study (GWAS) can be conducted to systematically analyze the contributions of genetic factors to a wide variety of complex diseases. Nevertheless, existing GWASs have provided highly ethnic specific data. Accordingly, to provide data specific to Taiwan, we established a large-scale genetic database in a single medical institution at the China Medical University Hospital. With current technological limitations, microarray analysis can detect only a limited number of single-nucleotide polymorphisms (SNPs) with a minor allele frequency of >1%. Nevertheless, imputation represents a useful alternative means of expanding data. In this study, we compared four imputation algorithms in terms of various metrics. We observed that among the compared algorithms, Beagle5.2 achieved the fastest calculation speed, smallest storage space, highest specificity, and highest number of high-quality variants. We obtained 15,277,414 high-quality variants in 175,871 people by using Beagle5.2. In our internal verification process, Beagle5.2 exhibited an accuracy rate of up to 98.75%. We also conducted external verification. Our imputed variants had a 79.91% mapping rate and 90.41% accuracy. These results will be combined with clinical data in future research. We have made the results available for researchers to use in formulating imputation algorithms, in addition to establishing a complete SNP database for GWAS and PRS researchers. We believe that these data can help improve overall medical capabilities, particularly precision medicine, in Taiwan.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here