Premium
SNP development from RNA ‐seq data in a nonmodel fish: how many individuals are needed for accurate allele frequency prediction?
Author(s) -
Schunter C.,
Garza J. C.,
Macpherson E.,
Pascual M.
Publication year - 2014
Publication title -
molecular ecology resources
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.96
H-Index - 136
eISSN - 1755-0998
pISSN - 1755-098X
DOI - 10.1111/1755-0998.12155
Subject(s) - biology , genetics , genotyping , snp genotyping , allele frequency , single nucleotide polymorphism , population , snp , loss of heterozygosity , population genomics , population genetics , computational biology , genotype , genomics , allele , genome , gene , demography , sociology
Single nucleotide polymorphisms ( SNP s) are rapidly becoming the marker of choice in population genetics due to a variety of advantages relative to other markers, including higher genomic density, data quality, reproducibility and genotyping efficiency, as well as ease of portability between laboratories. Advances in sequencing technology and methodologies to reduce genomic representation have made the isolation of SNP s feasible for nonmodel organisms. RNA‐seq is one such technique for the discovery of SNP s and development of markers for large‐scale genotyping. Here, we report the development of 192 validated SNP markers for parentage analysis in T ripterygion delaisi (the black‐faced blenny), a small rocky‐shore fish from the M editerranean S ea. RNA‐seq data for 15 individual samples were used for SNP discovery by applying a series of selection criteria. Genotypes were then collected from 1599 individuals from the same population with the resulting loci. Differences in heterozygosity and allele frequencies were found between the two data sets. Heterozygosity was lower, on average, in the population sample, and the mean difference between the frequencies of particular alleles in the two data sets was 0.135 ± 0.100. We used bootstrap resampling of the sequence data to predict appropriate sample sizes for SNP discovery. As cDNA library production is time‐consuming and expensive, we suggest that using seven individuals for RNA sequencing reduces the probability of discarding highly informative SNP loci, due to lack of observed polymorphism, whereas use of more than 12 samples does not considerably improve prediction of true allele frequencies.