z-logo
open-access-imgOpen Access
Expected Genotype Quality and Diploidized Marker Data from Genotyping‐by‐Sequencing of Urochloa spp. Tetraploids
Author(s) -
Matias Filipe Inácio,
Xavier Meireles Karem Guimarães,
Nagamatsu Sheila Tiemi,
Lima Barrios Sanzio Carvalho,
Borges do Valle Cacilda,
Carazzolle Marcelo Falsarella,
FritscheNeto Roberto,
Endelman Jeffrey B.
Publication year - 2019
Publication title -
the plant genome
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.403
H-Index - 41
ISSN - 1940-3372
DOI - 10.3835/plantgenome2019.01.0002
Subject(s) - biology , genotype , genotyping , genetics , genome , single nucleotide polymorphism , reference genome , gene
Core Ideas Introduced concept of expected genotype quality (EGQ) and software to calculate it Provided read depth guidelines for GBS in tetraploids Developed software to generate diploidized genotype calls from VCF files Demonstrated value of aligning GBS reads to a mock reference genome for SNP discovery Recommend filtering based on GQ and read depth to prevent genotype biasAlthough genotyping‐by‐sequencing (GBS) is a well‐established marker technology in diploids, the development of best practices for tetraploid species is a topic of current research. We determined the theoretical relationship between read depth and the phred‐scaled probability of genotype misclassification conditioned on the true genotype, which we call expected genotype quality (EGQ). If the GBS method has 0.5% allelic error, then 17 reads are needed to classify simplex tetraploids as heterozygous with 95% accuracy (EGQ = 13) vs. 61 reads to determine allele dosage. We developed an R script to convert tetraploid GBS data in variant call format (VCF) into diploidized genotype calls and applied it to 267 interspecific hybrids of the tetraploid forage grass Urochloa . When reads were aligned to a mock reference genome created from GBS data of the Urochloa brizantha (Hochst. ex A. Rich.) R. D. Webster cultivar Marandu, 25,678 biallelic single nucleotide polymorphism (SNPs) were discovered, compared with ∼3000 SNPs when aligning to the closest true reference genomes, Setaria viridis (L.) P. Beauv. and S. italica (L.) P. Beauv. Cross‐validation revealed that missing genotypes were imputed by the random forest method with a median accuracy of 0.85 regardless of heterozygote frequency. Using the Urochloa spp. hybrids, we illustrated how filtering samples based only on genotype quality (GQ) creates genotype bias; a depth threshold based on EGQ is also needed regardless of whether genotypes are called using a diploidized or allele dosage model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here