z-logo
open-access-imgOpen Access
Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets
Author(s) -
Bohling Justin
Publication year - 2020
Publication title -
ecology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.17
H-Index - 63
ISSN - 2045-7758
DOI - 10.1002/ece3.6483
Subject(s) - genome , reference genome , biology , genomics , population genomics , population , sequence assembly , evolutionary biology , computational biology , phylogenetic tree , genotyping , genetics , gene , genotype , demography , gene expression , transcriptome , sociology
The advent of high‐throughput sequencing (HTS) has made genomic‐level analyses feasible for nonmodel organisms. A critical step of many HTS pipelines involves aligning reads to a reference genome to identify variants. Despite recent initiatives, only a fraction of species has publically available reference genomes. Therefore, a common practice is to align reads to the genome of an organism related to the target species; however, this could affect read alignment and bias genotyping. In this study, I conducted an experiment using empirical RADseq datasets generated for two species of salmonids (Actinopterygii; Teleostei; Salmonidae) to address these questions. There are currently reference genomes for six salmonids of varying phylogenetic distance. I aligned the RADseq data to all six genomes and identified variants with several different genotypers, which were then fed into population genetic analyses. Increasing phylogenetic distance between target species and reference genome reduced the proportion of reads that successfully aligned and mapping quality. Reference genome also influenced the number of SNPs that were generated and depth at those SNPs, although the affect varied by genotyper. Inferences of population structure were mixed: increasing reference genome divergence reduced estimates of differentiation but similar patterns of population relationships were found across scenarios. These findings reveal how the choice of reference genome can influence the output of bioinformatic pipelines. It also emphasizes the need to identify best practices and guidelines for the burgeoning field of biodiversity genomics.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here