beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types | Zendy

Aaron T. L. Lun | Zendy; Hervé Pagès | Zendy; Mike L. Smith | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types

Author(s) -

Aaron T. L. Lun,

Hervé Pagès,

Mike L. Smith

Publication year - 2018

Publication title -

plos computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.628

H-Index - 182

eISSN - 1553-7358

pISSN - 1553-734X

DOI - 10.1371/journal.pcbi.1006135

Subject(s) - bioconductor , computer science , interoperability , throughput , set (abstract data type) , source code , representation (politics) , code (set theory) , sparse matrix , parallel computing , data set , external data representation , data structure , variety (cybernetics) , data mining , programming language , artificial intelligence , operating system , biology , biochemistry , physics , quantum mechanics , politics , gaussian , gene , political science , law , wireless

Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat , which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research