
Fault Detection and Tolerance in Cluster of Workstations using Message Passing Interface
Author(s) -
Syed Misbahuddin
Publication year - 2011
Publication title -
sir syed university research journal of engineering and technology
Language(s) - English
Resource type - Journals
eISSN - 2415-2048
pISSN - 1997-0641
DOI - 10.33317/ssurj.v1i1.72
Subject(s) - workstation , computer science , interface (matter) , fault tolerance , message passing interface , cluster (spacecraft) , operating system , task (project management) , computer cluster , node (physics) , message passing , parallel computing , engineering , systems engineering , bubble , structural engineering , maximum bubble pressure method
A Cluster of Workstations (COW) is network based multi-computer system aimed to replace supercomputers. A cluster of workstations works on Divisible Load Theory (DLT) according to which a job is divided into n subtasks and delegated to n workstations in the COW architecture. To get the job completed, all subtasks must be completed. Therefore, for satisfactory job completion, all workstations must be functional. However, a faulty node can suspend the overall job completion task until and unless some fault avoidance and correction measures are taken. This paper presents a fault detection and fault tolerant algorithm which will use Message Passing Interface (MPI) to identify faulty workstations and transfer the subtask being performed by them to a normally working workstation. The assigned workstations will continue their original subtasks in addition to assigned subtasks on time sharing basis.