z-logo
open-access-imgOpen Access
SFA-SPA: a suffix array based short peptide assembler for metagenomic data
Author(s) -
Youngik Yang,
Cuncong Zhong,
Shibu Yooseph
Publication year - 2015
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btv052
Subject(s) - metagenomics , computer science , software , data mining , suffix , computational biology , process (computing) , biology , programming language , gene , genetics , linguistics , philosophy
The determination of protein sequences from a metagenomic dataset enables the study of metabolism and functional roles of the organisms that are present in the sampled microbial community. We had previously introduced algorithm and software for the accurate reconstruction of protein sequences from short peptides identified on nucleotide reads in a metagenomic dataset. Here, we present significant computational improvements to the short peptide assembly algorithm that make it practical to reconstruct proteins from large metagenomic datasets containing several hundred million reads, while maintaining accuracy. The improved computational efficiency is achieved using a suffix array data structure that allows for fast querying during the assembly process, and a significant redesign of assembly steps that enables multi-threaded execution.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom