Parallelizing the execution of native data mining algorithms for computational biology | Zendy

Coro Gianpaolo | Zendy; Candela Leonardo | Zendy; Pagano Pasquale | Zendy; Italiano Angela | Zendy; Liccardo Loredana | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Parallelizing the execution of native data mining algorithms for computational biology

Author(s) -

Coro Gianpaolo,

Candela Leonardo,

Pagano Pasquale,

Italiano Angela,

Liccardo Loredana

Publication year - 2014

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3435

Subject(s) - computer science , scripting language , cloud computing , software , process (computing) , code (set theory) , python (programming language) , distributed computing , source code , data mining , software engineering , programming language , set (abstract data type) , operating system

Summary Data mining is being increasingly used in biology. Biologists are adopting prototyping languages, like R and Matlab, to facilitate the application of data mining algorithms to their data. As a result, their scripts are becoming increasingly complex and also require frequent updates. Application to large datasets becomes impractical and the time‐to‐paper increases. Furthermore, even if there are various systems that can be used to efficiently process large datasets, for example, using Cloud and High Performance Computing, they usually require procedures to be translated into specific languages or to be adapted to a certain computing platform. Such modifications can speed up the processing, but translation is not automatic, especially in complex cases, and can require a large amount of programming effort and accurate validation. In this paper, we propose an approach to parallelize data mining procedures in the form of compiled software or R scripts developed by biology communities of practice. Our approach requires minimal alteration of the original code. In many cases, there is no need for code modification. Furthermore, it allows for fast updating when a new version is ready. We clarify the constraints and the benefits of our method and report a practical use case to demonstrate such benefits compared with a standard execution. Our approach relies on a distributed network of web services and ultimately exposes the algorithms as‐a‐Service, to be invoked by remote thin clients. Copyright © 2014 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research