z-logo
open-access-imgOpen Access
Investigating an API for resilient exascale computing.
Author(s) -
Jon Stearley,
James L. Tomkins,
John P. VanDyke,
Kurt Brian Ferreira,
Patrick G. Bridges
Publication year - 2013
Publication title -
osti oai (u.s. department of energy office of scientific and technical information)
Language(s) - English
Resource type - Reports
DOI - 10.2172/1096503
Subject(s) - computer science , resilience (materials science) , overhead (engineering) , exascale computing , fault tolerance , embedded system , operating system , node (physics) , software , fault (geology) , distributed computing , parallel computing , supercomputer , geology , materials science , engineering , structural engineering , seismology , composite material
Increased HPC capability comes with increased complexity, part counts, and fault occurrences. In- creasing the resilience of systems and applications to faults is a critical requirement facing the viability of exascale systems, as the overhead of traditional checkpoint/restart is projected to outweigh its bene ts due to fault rates outpacing I/O bandwidths. As faults occur and propagate throughout hardware and software layers, pervasive noti cation and handling mechanisms are necessary. This report describes an initial investigation of fault types and programming interfaces to mitigate them. Proof-of-concept APIs are presented for the frequent and important cases of memory errors and node failures, and a strategy proposed for lesystem failures. These involve changes to the operating system, runtime, I/O library, and application layers. While a single API for fault handling among hardware and OS and application system-wide remains elusive, the e ort increased our understanding of both the mountainous challenges and the promising trailheads. 3

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom