Comparative analysis of soft-error detection strategies
Author(s) -
Gökçen Kestor,
Burcu O. Mutlu,
Joseph Manzano,
Omer Subasi,
Osman Ünsal,
Sriram Krishnamoorthy
Publication year - 2018
Publication title -
osti oai (u.s. department of energy office of scientific and technical information)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/3203217.3203240
Subject(s) - detector , computer science , soft error , iterative method , transient (computer programming) , error detection and correction , iterative learning control , machine learning , artificial intelligence , algorithm , electronic engineering , engineering , telecommunications , control (management) , operating system
Undetected soft errors caused by transient bit flips can lead to silent data corruption (SDC), an undesirable outcome where invalid results pass for valid ones. This has motivated the design of soft error detectors to minimize SDCs. However, the detectors have been studied under different contexts, making comparative evaluation difficult. In this paper, we present the first comprehensive evaluation of four online soft error detection techniques in detecting the adverse impact of soft errors on iterative methods. We observe that, across five iterative methods, the detectors studied achieve high but not perfect detection rates. To understand the potential for improved detection, we evaluate a machine-learning based detector that takes as features that are the runtime features observed by the individual detectors to arrive at their conclusions. Our evaluation demonstrates improved but still far from perfect detection accuracy for the machine learning based detectors. This extensive evaluation demonstrates the need for designing error detectors to handle the evolutionary behavior exhibited by iterative solvers.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom