z-logo
open-access-imgOpen Access
Energy Accounting and Control with SLURM Resource and Job Management System
Author(s) -
Yiannis Georgiou,
Thomas Cadeau,
David Glesser,
Danny Auble,
Morris A. Jette,
Matthieu Hautreux
Publication year - 2014
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
DOI - 10.1007/978-3-642-45249-9_7
Subject(s) - computer science , control (management) , accounting , resource management (computing) , knowledge management , operations research , engineering management , artificial intelligence , distributed computing , engineering , business
International audienceEnergy consumption has gradually become a very important parameter in High Performance Computing platforms. The Resource and Job Management System (RJMS) is the HPC middleware that is responsible for distributing computing power to applications and has knowledge of both the underlying resources and jobs needs. Therefore it is the best candidate for monitoring and controlling the energy consumption of the computations according to the job specifications. The integration of energy measurment mechanisms on RJMS and the consideration of energy consumption as a new characteristic in accounting seemed primordial at this time when energy has become a bottleneck to scalability. Since Power-Meters would be too expensive, other existing measurement models such as IPMI and RAPL can be exploited by the RJMS in order to track energy consumption and enhance the monitoring of the executions with energy considerations. In this paper we present the design and implementation of a new framework, developed upon SLURM Resource and Job Management System, which allows energy accounting per job with power profiling capabilities along with parameters for energy control features based on static frequency scaling of the CPUs. Since the goal of this work is the deployment of the framework on large petaflopic clusters such as CURIE, its cost and reliability are important issues. We evaluate the overhead of the design choices and the precision of the monitoring modes using different HPC benchmarks (Linpack, IMB, Stream) on a real-scale platform with integrated Power-meters. Our experiments show that the overhead is less than 0.6% in energy consumption and less than 0.2% in execution time while the error deviation compared to Power-meters less than 2% in most cases

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom