KernelHive: a new workflow‐based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs | Zendy

Rościszewski Paweł | Zendy; Czarnul Paweł | Zendy; Lewandowski Rafał | Zendy; SchallyKacprzak Marcel | Zendy

Premium

KernelHive: a new workflow‐based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs

Author(s) -

Rościszewski Paweł,

Czarnul Paweł,

Lewandowski Rafał,

SchallyKacprzak Marcel

Publication year - 2015

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3719

Subject(s) - computer science , parallel computing , scalability , multi core processor , workstation , distributed computing , scheduling (production processes) , supercomputer , computation , computer cluster , operating system , algorithm , operations management , economics

Summary The paper presents a new open‐source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy‐to‐use GUI is provided for modeling and monitoring with a repository of ready‐to‐use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password‐breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore