Premium
SNP discovery in nonmodel organisms: strand bias and base‐substitution errors reduce conversion rates
Author(s) -
Gonçalves da Silva Anders,
Barendse William,
Kijas James W.,
Barris Wes C.,
McWilliam Sean,
Bunch Rowan J.,
McCullough Russell,
Harrison Blair,
Hoelzel A. Rus,
England Phillip R.
Publication year - 2015
Publication title -
molecular ecology resources
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.96
H-Index - 136
eISSN - 1755-0998
pISSN - 1755-098X
DOI - 10.1111/1755-0998.12343
Subject(s) - biology , snp , genetics , single nucleotide polymorphism , genotyping , snp genotyping , computational biology , molecular inversion probe , tag snp , genomics , genome , genotype , gene
Single nucleotide polymorphisms ( SNP s) have become the marker of choice for genetic studies in organisms of conservation, commercial or biological interest. Most SNP discovery projects in nonmodel organisms apply a strategy for identifying putative SNP s based on filtering rules that account for random sequencing errors. Here, we analyse data used to develop 4723 novel SNP s for the commercially important deep‐sea fish, orange roughy ( Hoplostethus atlanticus ), to assess the impact of not accounting for systematic sequencing errors when filtering identified polymorphisms when discovering SNP s. We used SAM tools to identify polymorphisms in a velvet assembly of genomic DNA sequence data from seven individuals. The resulting set of polymorphisms were filtered to minimize ‘bycatch’—polymorphisms caused by sequencing or assembly error. An Illumina Infinium SNP chip was used to genotype a final set of 7714 polymorphisms across 1734 individuals. Five predictors were examined for their effect on the probability of obtaining an assayable SNP : depth of coverage, number of reads that support a variant, polymorphism type (e.g. A/C), strand‐bias and Illumina SNP probe design score. Our results indicate that filtering out systematic sequencing errors could substantially improve the efficiency of SNP discovery. We show that BLASTX can be used as an efficient tool to identify single‐copy genomic regions in the absence of a reference genome. The results have implications for research aiming to identify assayable SNP s and build SNP genotyping assays for nonmodel organisms.