Premium
KernelHive: a new workflow‐based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs
Author(s) -
Rościszewski Paweł,
Czarnul Paweł,
Lewandowski Rafał,
SchallyKacprzak Marcel
Publication year - 2015
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3719
Subject(s) - computer science , parallel computing , scalability , multi core processor , workstation , distributed computing , scheduling (production processes) , supercomputer , computation , computer cluster , operating system , algorithm , operations management , economics
Summary The paper presents a new open‐source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy‐to‐use GUI is provided for modeling and monitoring with a repository of ready‐to‐use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password‐breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd.