z-logo
Premium
Autosomal deletion/insertion polymorphisms for global stratification analyses and ancestry origin inferences of different continental populations by machine learning methods
Author(s) -
Jin Xiaoye,
Liu Yuluo,
Zhang Yuanyuan,
Li Yongle,
Chen Chuanliang,
Wang Hongdan
Publication year - 2021
Publication title -
electrophoresis
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.666
H-Index - 158
eISSN - 1522-2683
pISSN - 0173-0835
DOI - 10.1002/elps.202100044
Subject(s) - pairwise comparison , population , bayes' theorem , population stratification , biology , ancestry informative marker , evolutionary biology , divergence (linguistics) , population genetics , naive bayes classifier , genetics , allele frequency , statistics , artificial intelligence , mathematics , allele , bayesian probability , demography , computer science , genotype , single nucleotide polymorphism , gene , support vector machine , linguistics , philosophy , sociology
A lot of population data of 30 deletion/insertion polymorphisms (DIPs) of the Investigator DIPplex kit in different continental populations have been reported. Here, we assessed genetic distributions of these 30 DIPs in different continental populations to pinpoint candidate ancestry informative DIPs. Besides, the effectiveness of machine learning methods for ancestry analysis was explored. Pairwise informativeness ( In ) values of 30 DIPs revealed that six loci displayed relatively high In values (>0.1) among different continental populations. Besides, more loci showed high population‐specific divergence (PSD) values in African population. Based on the pairwise In and PSD values of 30 DIPs, 17 DIPs in the Investigator DIPplex kit were selected to ancestry analyses of African, European, and East Asian populations. Even though 30 DIPs provided better ancestry resolution of these continental populations based on the results of PCA and population genetic structure, we found that 17 DIPs could also distinguish these continental populations. More importantly, these 17 DIPs possessed more balanced cumulative PSD distributions in these populations. Six machine learning methods were used to perform ancestry analyses of these continental populations based on 17 DIPs. Obtained results revealed that naïve Bayes manifested the greatest performance; whereas, k nearest neighbor showed relatively low performance. To sum up, these machine learning methods, especially for naïve Bayes, could be used as the valuable tool for ancestry analysis.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here