Premium
Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics
Author(s) -
Nevado B.,
RamosOnsins S. E.,
PerezEnciso M.
Publication year - 2014
Publication title -
molecular ecology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.619
H-Index - 225
eISSN - 1365-294X
pISSN - 0962-1083
DOI - 10.1111/mec.12693
Subject(s) - biology , genome , genomics , genotype , haplotype , range (aeronautics) , population genomics , population , reference genome , computational biology , genetics , evolutionary biology , gene , materials science , demography , sociology , composite material
Decreasing costs of next‐generation sequencing ( NGS ) experiments have made a wide range of genomic questions open for study with nonmodel organisms. However, experimental designs and analysis of NGS data from less well‐known species are challenging because of the lack of genomic resources. In this work, we investigate the performance of alternative experimental designs and bioinformatics approaches in estimating variability and neutrality tests based on the site‐frequency‐spectrum ( SFS ) from individual resequencing data. We pay particular attention to challenges faced in the study of nonmodel organisms, in particular the absence of a species‐specific reference genome, although phylogenetically close genomes are assumed to be available. We compare the performance of three alternative bioinformatics approaches – genotype calling, genotype–haplotype calling and direct estimation without calling genotypes. We find that relying on genotype calls provides biased estimates of population genetic statistics at low to moderate read depth (2–8×). Genotype–haplotype calling returns more accurate estimates irrespective of the divergence to the reference genome, but requires moderate depth (8–20×). Direct estimation without calling genotypes returns the most accurate estimates of variability and of most SFS tests investigated, including at low read depth (2–4×). Studies without species‐specific reference genome should thus aim for low read depth and avoid genotype calling whenever individual genotypes are not essential. Otherwise, aiming for moderate to high depth at the expense of number of individuals, and using genotype–haplotype calling, is recommended.