Premium
Application kernels: HPC resources performance monitoring and variance analysis
Author(s) -
Simakov Nikolay A.,
White Joseph P.,
DeLeon Robert L.,
Ghadersohi Amin,
Furlani Thomas R.,
Jones Matthew D.,
Gallo Steven M.,
Patra Abani K.
Publication year - 2015
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3564
Subject(s) - computer science , workload , quality of service , supercomputer , performance metric , throughput , variance (accounting) , bandwidth (computing) , kernel (algebra) , computer cluster , metric (unit) , software , distributed computing , operating system , computer network , wireless , operations management , business , mathematics , management , accounting , combinatorics , economics
Summary Application kernels are computationally lightweight benchmarks or applications run repeatedly on high performance computing (HPC) clusters in order to track the Quality of Service (QoS) provided to the users. They have been successful in detecting a variety of hardware and software issues, some severe, that have subsequently been corrected, resulting in improved system performance and throughput. In this work, the application kernels performance monitoring module of eXtreme Data Metrics on Demand (XDMoD) is described. Through the XDMoD framework, the application kernels have been run repetitively on the Texas Advanced Computing Center's Stampede and Lonestar4 clusters for a total of over 14,000 jobs. This provides a body of data on the HPC clusters operation that can be used to statistically analyze how the application performance, as measured by metrics such as execution time and communication bandwidth, is affected by the cluster's workload. We discuss metric distributions, carry out regression and correlation analyses, and use a PCA study to describe the variance and relate the variance to factors such as the spatial distribution of the application in the cluster. Ultimately, these types of analyses can be used to improve the application kernel mechanism, which in turn results in improved QoS of the HPC infrastructure that is delivered to the end users. Copyright © 2015 John Wiley & Sons, Ltd.