z-logo
open-access-imgOpen Access
Fault Detection and Tolerance in Cluster of Workstations using Message Passing Interface
Author(s) -
Syed Misbahuddin
Publication year - 2011
Publication title -
sir syed university research journal of engineering and technology
Language(s) - English
Resource type - Journals
eISSN - 2415-2048
pISSN - 1997-0641
DOI - 10.33317/ssurj.v1i1.72
Subject(s) - workstation , computer science , interface (matter) , fault tolerance , message passing interface , cluster (spacecraft) , operating system , task (project management) , computer cluster , node (physics) , message passing , parallel computing , engineering , systems engineering , bubble , structural engineering , maximum bubble pressure method
A Cluster of Workstations (COW) is network based multi-computer system aimed to replace supercomputers. A cluster of workstations works on Divisible Load Theory (DLT) according to which a job is divided into n subtasks and delegated to n workstations in the COW architecture. To get the job completed, all subtasks must be completed.  Therefore, for satisfactory job completion, all workstations must be functional. However, a faulty node can suspend the overall job completion task until and unless some fault avoidance and correction measures are taken.  This paper presents a fault detection and fault tolerant algorithm which will use Message Passing Interface (MPI) to identify faulty workstations and transfer the subtask being performed by them to a normally working workstation. The assigned workstations will continue their original subtasks in addition to assigned subtasks on time sharing basis.  

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here