AllSome Sequence Bloom Trees | Zendy

Chen Sun | Zendy; Robert S. Harris | Zendy; Rayan Chikhi | Zendy; Paul Medvedev | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

AllSome Sequence Bloom Trees

Author(s) -

Chen Sun,

Robert S. Harris,

Rayan Chikhi,

Paul Medvedev

Publication year - 2018

Publication title -

journal of computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.585

H-Index - 95

eISSN - 1557-8666

pISSN - 1066-5277

DOI - 10.1089/cmb.2017.0258

Subject(s) - computer science , bloom filter , sequence (biology) , tree (set theory) , search engine indexing , set (abstract data type) , upload , data structure , trie , data mining , computational biology , information retrieval , biology , algorithm , world wide web , mathematics , genetics , operating system , mathematical analysis , programming language

The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research