z-logo
open-access-imgOpen Access
The importance of disease incidence rate on performance of GBLUP, threshold BayesA and machine learning methods in original and imputed data set
Author(s) -
Yousef Naderi,
Saadat Sadeghi
Publication year - 2020
Publication title -
spanish journal of agricultural research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.337
H-Index - 36
eISSN - 2171-9292
pISSN - 1695-971X
DOI - 10.5424/sjar/2020183-15228
Subject(s) - boosting (machine learning) , imputation (statistics) , statistics , linkage disequilibrium , single nucleotide polymorphism , genotype , genomic selection , random forest , mathematics , artificial intelligence , computer science , machine learning , biology , missing data , genetics , gene
Aim of study: To predict genomic accuracy of binary traits considering different rates of disease incidence.Area of study: SimulationMaterial and methods: Two machine learning algorithms including Boosting and Random Forest (RF) as well as threshold BayesA (TBA) and genomic BLUP (GBLUP) were employed. The predictive ability methods were evaluated for different genomic architectures using imputed (i.e. 2.5K, 12.5K and 25K panels) and their original 50K genotypes. We evaluated the three strategies with different rates of disease incidence (including 16%, 50% and 84% threshold points) and their effects on genomic prediction accuracy.Main results: Genotype imputation performed poorly to estimate the predictive ability of GBLUP, RF, Boosting and TBA methods when using the low-density single nucleotide polymorphisms (SNPs) chip in low linkage disequilibrium (LD) scenarios. The highest predictive ability, when the rate of disease incidence into the training set was 16%, belonged to GBLUP, RF, Boosting and TBA methods. Across different genomic architectures, the Boosting method performed better than TBA, GBLUP and RF methods for all scenarios and proportions of the marker sets imputed. Regarding the changes, the RF resulted in a further reduction compared to Boosting, TBA and GBLUP, especially when the applied data set contained 2.5K panels of the imputed genotypes.Research highlights: Generally, considering high sensitivity of methods to imputation errors, the application of imputed genotypes using RF method should be carefully evaluated.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here