Premium
Design, implementation and evaluation of ICARE: an efficient recoverable DSM
Author(s) -
Kermarrec A.M.,
Morin C.,
Banâtre M.
Publication year - 1998
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/(sici)1097-024x(19980725)28:9<981::aid-spe182>3.0.co;2-x
Subject(s) - computer science , microkernel , rollback , replication (statistics) , fault tolerance , process (computing) , throughput , distributed computing , exploit , embedded system , operating system , computer security , database , wireless , statistics , database transaction , mathematics
In the light of the increasing throughput of local area networks, Networks Of Workstations (NOWs) which provide a Distributed Shared Memory (DSM) have become a convenient and cheaper alternative to parallel architectures in the framework of parallel scientific applications. However, the probability that a failure occurs in such a system made up of a large number of components must not be neglected, especially for long‐running applications. This paper presents the design, implementation and performance evaluation of ICARE, a page‐based recoverable DSM implemented on top of an ATM‐based NOW running the CHORUS microkernel. ICARE relies on a Backward Error Recovery (BER) mechanism, and provides a way to combine both efficiency and high‐availability. The fact that checkpoints are stored in volatile memory provides a low‐cost fault‐tolerance mechanism, as well as the opportunity to exploit the symbiotic relationship between the data replication implemented in DSM systems and that needed for fault‐tolerance. Furthermore, ICARE efficiently implements transparent process rollback recovery. Performance evaluations show the efficiency of the ICARE prototype that implements the proposed algorithms. © 1998 John Wiley & Sons, Ltd.