Premium
Potential of Low‐Coverage Genotyping‐by‐Sequencing and Imputation for Cost‐Effective Genomic Selection in Biparental Segregating Populations
Author(s) -
Gorjanc Gregor,
Dumasy JeanFrancois,
Gonen Serap,
Gaynor R. Chris,
Antolin Roberto,
Hickey John M.
Publication year - 2017
Publication title -
crop science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.76
H-Index - 147
eISSN - 1435-0653
pISSN - 0011-183X
DOI - 10.2135/cropsci2016.08.0675
Subject(s) - imputation (statistics) , biology , genotyping , single nucleotide polymorphism , genomic selection , snp , selection (genetic algorithm) , statistics , genetics , genotype , computational biology , computer science , missing data , mathematics , machine learning , gene
Genotyping‐by‐sequencing (GBS) is an alternative genotyping method to single‐nucleotide polymorphism (SNP) arrays that has received considerable attention in the plant breeding community. In this study we use simulation to quantify the potential of low‐coverage GBS and imputation for cost‐effective genomic selection in biparental segregating populations. The simulations comprised a range of scenarios where SNP array or GBS data were used to train the genomic selection model, to predict breeding values, or both. The GBS data were generated with sequencing coverages ( x ) from 4 x to 0.01 x . The data were used either nonimputed or imputed by the AlphaImpute program. The size of the training and prediction sets was either held fixed or was increased by reducing sequencing coverage per individual. The results show that nonimputed 1 x GBS data provided comparable prediction accuracy and bias, and for the used measurement of return on investment, outperformed the SNP array data. Imputation allowed for further reduction in sequencing coverage, to as low as 0.1 x with 10,000 markers or 0.01 x with 100,000 markers. The results suggest that using such data in biparental families gave up to 5.63 times higher return on investment than using the SNP array data. Reduction of sequencing coverage per individual and imputation can be leveraged to genotype larger training sets to increase prediction accuracy and larger prediction sets to increase selection intensity, which both allow for higher response to selection and higher return on investment.