z-logo
open-access-imgOpen Access
Building a Phylogenomic Pipeline for the Eukaryotic Tree of Life - Addressing Deep Phylogenies with Genome-Scale Data
Author(s) -
Jessica R. Grant,
Laura A. Katz
Publication year - 2014
Publication title -
plos currents
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.282
H-Index - 49
ISSN - 2157-3999
DOI - 10.1371/currents.tol.c24b6054aebf3602748ac042ccc8f2e9
Subject(s) - phylogenomics , tree of life (biology) , pipeline (software) , genome , computational biology , computer science , scripting language , python (programming language) , tree (set theory) , inference , biology , supermatrix , genomics , phylogenetic tree , coalescent theory , phylogenetics , data science , data mining , gene , genetics , artificial intelligence , clade , mathematical analysis , current algebra , operating system , mathematics , algebra over a field , affine lie algebra , pure mathematics , programming language
Background Understanding the evolutionary relationships of all eukaryotes on Earth remains a paramount goal of modern biology, yet analyzing homologous sequences across 1.8 billion years of eukaryotic evolution is challenging. Many existing tools for identifying gene orthologs are inadequate when working with heterogeneous rates of evolution and endosymbiotic/lateral gene transfer. Moreover, genomic-scale sequencing, which was once the domain of large sequencing centers, has advanced to the point where small laboratories can now generate the data needed for phylogenomic studies. This has opened the door for increased taxonomic sampling as individual research groups have the ability to conduct genome-scale projects on their favorite non-model organism. Results Here we present some of the tools developed, and insights gained, as we created a pipeline that combines data-mining from public databases and our own transcriptome data to study the eukaryotic tree of life. The first steps of a phylogenomic pipeline involve choosing taxa and loci, and making decisions about how to handle alleles, paralogs and non-overlapping sequences. Next, orthologs are aligned for analyses including gene tree reconstruction and concatenation for supermatrix approaches. To build our pipeline, we created scripts written in Python that integrate third-party tools with custom methods. As a test case, we present the placement of five amoebae on the eukaryotic tree of life based on analyses of transcriptome data. Our scripts available on GitHUb and may be used as-is for automated analyses of large scale phylogenomics, or adapted for use in other types of studies. Conclusion Analyses on the scale of all eukaryotes present challenges not necessarily found in studies of more closely related organisms. Our approach will be of relevance to others for whom existing third-party tools fail to fully answer desired phylogenetic questions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here