z-logo
open-access-imgOpen Access
Record-and-Replay Techniques for HPC Systems: A Survey
Author(s) -
Dylan Chapp,
Kento Sato,
Dong H. Ahn,
Michela Taufer
Publication year - 2018
Publication title -
supercomputing frontiers and innovations
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.375
H-Index - 16
eISSN - 2409-6008
pISSN - 2313-8734
DOI - 10.14529/jsfi180102
Subject(s) - computer science
Record-and-replay techniques provide the ability to record executions of  nondeterministic applications and re-execute them identically. These techniques find use in the contexts of debugging, reproducibility, and fault-tolerance, especially in the presence of nondeterministic factors such as message races.  Record-and-replay techniques are highly diverse in terms of the fidelity of replay they provide, the assumptions they make about the recorded application, the programming models they target, and the runtime overheads they impose. In the high performance computing (HPC) environment, all the above factors must be considered in concert, thus presenting additional implementation challenges. In this manuscript, we survey record-and-replay techniques in terms of the programming models they target and the workloads on which they were evaluated,  providing a categorization of these techniques benefiting application developer s  and researchers targeting exascale challenges. This manuscript answers three questions through this survey:  What are the gaps in the existing space of record-and-replay techniques? What is the roadmap to widespread use of record-and-replay on production-scale HPC workloads? And, what are the critical open problems that must be addressed to make record-and-replay viable at exascale? Keywords:  Reproducibility, nondeterminism, fault-tolerance, exascale, message-passing, shared memory, proxy application, HPC benchmarks

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom