De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm | Zendy

Kristoffer Sahlin | Zendy; Paul Medvedev | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm

Author(s) -

Kristoffer Sahlin,

Paul Medvedev

Publication year - 2020

Publication title -

journal of computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.585

H-Index - 95

eISSN - 1557-8666

pISSN - 1066-5277

DOI - 10.1089/cmb.2019.0299

Subject(s) - cluster analysis , bottleneck , computer science , scalability , nanopore sequencing , data mining , algorithm , greedy algorithm , dna sequencing , biology , machine learning , gene , genetics , database , embedded system

Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (to scale) and makes use of quality values (to handle variable error rates). We test isONclust on three simulated and five biological data sets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large data sets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research