z-logo
Premium
ddrage : A data set generator to evaluate ddRADseq analysis software
Author(s) -
Timm Henning,
Weigand Hannah,
Weiss Martina,
Leese Florian,
Rahmann Sven
Publication year - 2018
Publication title -
molecular ecology resources
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.96
H-Index - 136
eISSN - 1755-0998
pISSN - 1755-098X
DOI - 10.1111/1755-0998.12743
Subject(s) - software , set (abstract data type) , computer science , biology , data mining , dna sequencing , data set , genome , computational biology , genetics , artificial intelligence , gene , programming language
High‐throughput sequencing makes it possible to evaluate thousands of genetic markers across genomes and populations. Reduced‐representation sequencing approaches, like double‐digest restriction site‐associated DNA sequencing (ddRADseq), are frequently applied to screen for genetic variation. In particular in nonmodel organisms where whole‐genome sequencing is not yet feasible, ddRADseq has become popular as it allows genomewide assessment of variation patterns even in the absence of other genomic resources. However, while many tools are available for the analysis of ddRADseq data, few options exist to simulate ddRADseq data in order to evaluate the accuracy of downstream tools. The available tools either focus on the optimization of ddRAD experiment design or do not provide the information necessary for a detailed evaluation of different ddRAD analysis tools. For this task, a ground truth, that is, the underlying information of all effects in the data set, is required. Therefore, we here present ddrage , the ddRA D Data Set Ge nerator, that allows both developers and users to evaluate their ddRAD analysis software. ddrage allows the user to adjust many parameters such as coverage and rates of mutations, sequencing errors or allelic dropouts, in order to generate a realistic simulated ddRADseq data set for given experimental scenarios and organisms. The simulated reads can be easily processed with available analysis software such as stacks or pyrad and evaluated against the underlying parameters used to generate the data to gauge the impact of different parameter values used during downstream data processing.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here