Premium
Approximate Reliability and Availability Models for High Availability and Fault‐tolerant Systems with Repair
Author(s) -
Bowles John B.,
Dobbins J. Gregory
Publication year - 2004
Publication title -
quality and reliability engineering international
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.913
H-Index - 62
eISSN - 1099-1638
pISSN - 0748-8017
DOI - 10.1002/qre.577
Subject(s) - mean time between failures , reliability engineering , failure rate , reliability (semiconductor) , fault tolerance , computer science , constant (computer programming) , series (stratigraphy) , engineering , paleontology , power (physics) , physics , quantum mechanics , biology , programming language
Abstract Systems designed for high availability and fault tolerance are often configured as a series combination of redundant subsystems. When a unit of a subsystem fails, the system remains operational while the failed unit is repaired; however, if too many units in a subsystem fail concurrently, the system fails. Under conditions usually met in practical situations, we show that the reliability and availability of such systems can be accurately modeled by representing each redundant subsystem with a constant, ‘effective’ failure rate equal to the inverse of the subsystem mean‐time‐to‐failure (MTTF). The approximation model is surprisingly accurate, with an error on the order of the square of the ratio mean‐time‐to‐repair to mean‐time‐to‐failure (MTTR/MTTF), and it has wide applicability for commercial, high‐availability and fault‐tolerant computer systems. The effective subsystem failure rates can be used to: (1) evaluate the system and subsystem reliability and availability; (2) estimate the system MTTF; and (3) provide a basis for the iterative analysis of large complex systems. Some observations from renewal theory suggest that the approximate models can be used even when the unit failure rates are not constant and when the redundant units are not homogeneous. Copyright © 2004 John Wiley & Sons, Ltd.