z-logo
Premium
Hardware implementation of fault‐tolerance in dual computer systems
Author(s) -
Samet Refik
Publication year - 2009
Publication title -
quality and reliability engineering international
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.913
H-Index - 62
eISSN - 1099-1638
pISSN - 0748-8017
DOI - 10.1002/qre.1018
Subject(s) - fault tolerance , fault (geology) , computer science , software fault tolerance , dual (grammatical number) , fault coverage , fault model , reliability (semiconductor) , embedded system , stuck at fault , sequence (biology) , key (lock) , point (geometry) , fault injection , reliability engineering , real time computing , distributed computing , fault detection and isolation , engineering , software , operating system , artificial intelligence , art , mathematics , actuator , literature , biology , genetics , power (physics) , geometry , quantum mechanics , electronic circuit , physics , seismology , electrical engineering , geology
In this paper, we propose an architectural design for a dual computer system (DCS) that operates in real‐time with the fault‐tolerance implemented purely by hardware. We have a novel design allowing the implementation of hardware that performs the following key services: the determination of fault type (temporary or permanent) and the localization of the faulty computer without using self‐testing techniques and diagnosis routines. We also propose a non‐trivial sequence of services for fault‐tolerance in which the determination of the fault type and the recovery of computational processes after a temporary fault are realized before fault localization. Our design has several benefits: the designed hardware shortens the recovery point time period; the proposed non‐trivial sequence of fault‐tolerant services reduces (to two) the number of logical segments that should be re‐run to recover the computational processes; and the determination of the fault type allows eliminating only the computer with a permanent fault. These contributions bring both an increase in system performance and an increase in the degree of system reliability. Copyright © 2009 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here