
FRESH : Fully Reliable and Effective protection against Soft and Hard errors
Author(s) -
Daehoon Son,
Hwisoo So,
Jinhyo Jung,
Yohan Ko,
Aviral Shrivastava,
Kyoungwoo Lee
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3574577
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Advances in modern computing systems have led to an unprecedented spread of safety-critical applications in real-world environments. In safety-critical applications, preventing malfunctions due to faults is a primary design concern, as malfunctions in such applications can induce catastrophic results. Software-level redundant multithreading (RMT) solutions, which do not require hardware modifications and hence do not incur hardware costs, are attractive alternatives to hardware-level redundancy solutions for hardware unreliability issues, such as soft and hard errors. However, existing software-level RMT solutions can only provide fault detection and rely on external schemes for error recovery. This study investigated the potential of software-level RMT schemes for complete soft and hard error detection and recovery. First, a baseline software-level triple redundant multithreading (STRMT) scheme was implemented to serve as a baseline, pinpointing the ineffectiveness of the naïve STRMT, which makes the application even more vulnerable than the unprotected version due to the runtime overhead. Subsequently, Fully Reliable and Effective protection against Soft and Hard errors (FRESH) was introduced as a software-only RMT scheme that can achieve comprehensive error resiliency against both soft and hard errors. The main idea of FRESH is to distribute and intertwine error detection and recovery operations between redundant threads based on thread-level load-back checking of the state-of-the-art RMT scheme. FRESH further applies a lazy-fault-diagnosis optimization to reduce the number of thread-level synchronizations required for fault detection and recovery. Experimental results with an ARM cortex53-like μ-architecture simulated microprocessor demonstrated that FRESH can reduce program failure rate by around 99.88% compared to the unprotected versions.