
The unexpected depths of genome‐skimming data: A case study examining Goodeniaceae floral symmetry genes 1
Author(s) -
Berger Brent A.,
Han Jiahong,
Sessa Emily B.,
Gardner Andrew G.,
Shepherd Kelly A.,
Ricigliano Vincent A.,
Jabaily Rachel S.,
Howarth Dianella G.
Publication year - 2017
Publication title -
applications in plant sciences
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.64
H-Index - 23
ISSN - 2168-0450
DOI - 10.3732/apps.1700042
Subject(s) - biology , genome , gene , phylogenetic tree , nuclear gene , downstream (manufacturing) , dna sequencing , genetics , computational biology , evolutionary biology , operations management , economics
Premise of the study: The use of genome skimming allows systematists to quickly generate large data sets, particularly of sequences in high abundance (e.g., plastomes); however, researchers may be overlooking data in low abundance that could be used for phylogenetic or evo‐devo studies. Here, we present a bioinformatics approach that explores the low‐abundance portion of genome‐skimming next‐generation sequencing libraries in the fan‐flowered Goodeniaceae. Methods: Twenty‐four previously constructed Goodeniaceae genome‐skimming Illumina libraries were examined for their utility in mining low‐copy nuclear genes involved in floral symmetry, specifically the CYCLOIDEA ( CYC )‐like genes. De novo assemblies were generated using multiple assemblers, and BLAST searches were performed for CYC1 , CYC2 , and CYC3 genes. Results: Overall Trinity, SOAPdenovo‐Trans, and SOAPdenovo implementing lower k ‐mer values uncovered the most data, although no assembler consistently outperformed the others. Using SOAPdenovo‐Trans across all 24 data sets, we recovered four CYC ‐like gene groups (CYC1, CYC2, CYC3A, and CYC3B) from a majority of the species. Alignments of the fragments included the entire coding sequence as well as upstream and downstream regions. Discussion: Genome‐skimming data sets can provide a significant source of low‐copy nuclear gene sequence data that may be used for multiple downstream applications.