Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark | Zendy

Max C. Klein | Zendy; Rati Sharma | Zendy; Christopher H. Bohrer | Zendy; Cameron M. Avelis | Zendy; Elijah Roberts | Zendy

Open Access

Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark

Author(s) -

Max C. Klein,

Rati Sharma,

Christopher H. Bohrer,

Cameron M. Avelis,

Elijah Roberts

Publication year - 2016

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/btw614

Subject(s) - spark (programming language) , computer science , scalability , license , open source , source code , mit license , big data , code (set theory) , domain (mathematical analysis) , data mining , informatics , python (programming language) , software , data science , database , programming language , operating system , mathematical analysis , mathematics , set (abstract data type) , electrical engineering , engineering

Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research