A library and environment for parallel processing in a power-limited CPU+GPU cluster environment | Zendy

Piotr Sienski | Zendy; Sebastian Lesniewski | Zendy; Aleksandra Hein | Zendy; Szymon Kepinski | Zendy; Wiktoria Lewicka | Zendy; Agata Geisler | Zendy; Pawel Czarnul | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A library and environment for parallel processing in a power-limited CPU+GPU cluster environment

Author(s) -

Piotr Sienski,

Sebastian Lesniewski,

Aleksandra Hein,

Szymon Kepinski,

Wiktoria Lewicka,

Agata Geisler,

Pawel Czarnul

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3621020

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

In this article, we extend the CUDAMPILIB framework [1], which facilitates the programming of parallel applications for multi-node systems with one or more graphical processing units (GPUs) per node. The framework employs an OpenMP-extended CUDA API while utilizing free CPU cores not engaged in GPU control for additional computation. We further enhance the power-capping algorithm to consider both CPUs and GPUs within a cluster. To maintain the original framework’s workflow, we mimicked CUDA streams functionality for CPUs, ensuring seamless integration. Given the significant architectural differences between GPUs and CPUs, we implemented a dynamic data packet sizing algorithm to prevent CPUs from becoming bottlenecks, particularly in tasks heavily favored by GPUs. In order to perform comprehensive testing of proposed solution, we adapted applications proposed in original CUDAMPILIB publication to operate in both GPU+CPU and the original GPU-only mode alongside implemented new ones. In total, the proposed solution was evaluated on 4 different applications and 3 different testbed environments. For each application and testbed environment, experiments were performed in order to determine effect of different variables like enabling CPU usage, node count, batch sizes, and power caps affect application performance. Experiments revealed performance and power efficiency gains in several cases compared to the original framework - including average execution time reduction of 13.7% from enabling CPU computations in 16 node environment and average execution time reduction under power cap of 11.6% across all tested applications and environments. Code for the framework is open source and freely available online a .

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research