z-logo
open-access-imgOpen Access
A library and environment for parallel processing in a power-limited CPU+GPU cluster environment
Author(s) -
Piotr Sienski,
Sebastian Lesniewski,
Aleksandra Hein,
Szymon Kepinski,
Wiktoria Lewicka,
Agata Geisler,
Pawel Czarnul
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3621020
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In this article, we extend the CUDAMPILIB framework [1], which facilitates the programming of parallel applications for multi-node systems with one or more graphical processing units (GPUs) per node. The framework employs an OpenMP-extended CUDA API while utilizing free CPU cores not engaged in GPU control for additional computation. We further enhance the power-capping algorithm to consider both CPUs and GPUs within a cluster. To maintain the original framework’s workflow, we mimicked CUDA streams functionality for CPUs, ensuring seamless integration. Given the significant architectural differences between GPUs and CPUs, we implemented a dynamic data packet sizing algorithm to prevent CPUs from becoming bottlenecks, particularly in tasks heavily favored by GPUs. In order to perform comprehensive testing of proposed solution, we adapted applications proposed in original CUDAMPILIB publication to operate in both GPU+CPU and the original GPU-only mode alongside implemented new ones. In total, the proposed solution was evaluated on 4 different applications and 3 different testbed environments. For each application and testbed environment, experiments were performed in order to determine effect of different variables like enabling CPU usage, node count, batch sizes, and power caps affect application performance. Experiments revealed performance and power efficiency gains in several cases compared to the original framework - including average execution time reduction of 13.7% from enabling CPU computations in 16 node environment and average execution time reduction under power cap of 11.6% across all tested applications and environments. Code for the framework is open source and freely available online a .

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom