Premium
ddradseqtools : a software package for in silico simulation and testing of double‐digest RAD seq experiments
Author(s) -
MoraMárquez F.,
GarcíaOlivares V.,
Emerson B. C.,
López de Heredia U.
Publication year - 2017
Publication title -
molecular ecology resources
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.96
H-Index - 136
eISSN - 1755-0998
pISSN - 1755-098X
DOI - 10.1111/1755-0998.12550
Subject(s) - in silico , software , biology , computational biology , indel , pipeline (software) , preprocessor , computer science , data mining , genetics , single nucleotide polymorphism , gene , artificial intelligence , operating system , genotype
Double‐digested RADseq (dd RAD seq) is a NGS methodology that generates reads from thousands of loci targeted by restriction enzyme cut sites, across multiple individuals. To be statistically sound and economically optimal, a dd RAD seq experiment has a preliminary design stage that needs to consider issues related to the selection of enzymes, particular features of the genome of the focal species, possible modifications to the library construction protocol, coverage needed to minimize missing data, and the potential sources of error that may impact upon the coverage. We present ddradseqtools , a software package to help dd RAD seq experimental design by (i) the generation of in silico double‐digested fragments; (ii) the construction of modified dd RAD seq libraries using adapters with either one or two indexes and degenerate base regions ( DBR s) to quantify PCR duplicates; and (iii) the initial steps of the bioinformatics preprocessing of reads. ddradseqtools generates single‐end ( SE ) or paired‐end ( PE ) reads that may bear SNP s and/or indels. The effect of allele dropout and PCR duplicates on coverage is also simulated. The resulting output files can be submitted to pipelines of alignment and variant calling, to allow the fine‐tuning of parameters. The software was validated with specific tests for the correct operability of the program. The correspondence between in silico settings and parameters from dd RAD seq in vitro experiments was assessed to provide guidelines for the reliable performance of the software. ddradseqtools is cost‐efficient in terms of execution time, and can be run on computers with standard CPU and RAM configuration.