z-logo
open-access-imgOpen Access
Final Report for ?Queuing Network Models of Performance of High End Computing Systems?
Author(s) -
J. Galen Buckwalter
Publication year - 2005
Language(s) - English
Resource type - Reports
DOI - 10.2172/883762
Subject(s) - computer science , supercomputer , benchmark (surveying) , parallel computing , queueing theory , computation , profiling (computer programming) , software , execution time , computer cluster , distributed computing , operating system , computer network , geodesy , algorithm , geography
The primary objective of this project is to perform general research into queuing network models of performance of high end computing systems. A related objective is to investigate and predict how an increase in the number of nodes of a supercomputer will decrease the running time of a user's software package, which is often referred to as the strong scaling problem. We investigate the large, MPI-based Linux cluster MCR at LLNL, running the well-known NAS Parallel Benchmark (NPB) applications. Data is collected directly from NPB and also from the low-overhead LLNL profiling tool mpiP. For a run, we break the wall clock execution time of the benchmark into four components: switch delay, MPI contention time, MPI service time, and non-MPI computation time. Switch delay is estimated from message statistics. MPI service time and non-MPI computation time are calculated directly from measurement data. MPI contention is estimated by means of a queuing network model (QNM), based in part on MPI service time. This model of execution time validates reasonably well against the measured execution time, usually within 10%. Since the number of nodes used to run the application is a major input to the model, we can use the model to predict application execution times for various numbers of nodes. We also investigate how the four components of execution time scale individually as the number of nodes increases. Switch delay and MPI service time scale regularly. MPI contention is estimated by the QNM submodel and also has a fairly regular pattern. However, non-MPI compute time has a somewhat irregular pattern, possibly due to caching effects in the memory hierarchy. In contrast to some other performance modeling methods, this method is relatively fast to set up, fast to calculate, simple for data collection, and yet accurate enough to be quite useful

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here