z-logo
open-access-imgOpen Access
Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System
Author(s) -
Anh Nguyen-Tuong,
Andrew S. Grimshaw,
Mark Hyett
Publication year - 1996
Language(s) - English
DOI - 10.1109/srds.1996.10001
Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunateb, most parallel processing systems have not been designed with fault-tolerance in mind. Mentat is a high-performance objec t-oriented parallel processing system that is based on an extension of the data-flow model. The functional nature of data-flow enabies both parallelism and faulttolerance. In this paper, we exploit the data-flow underpinning of Mentat to provide easy-to-use and transparent fault-tolerance. We present results on both a small-scale network and a wide-area heterogeneous environment that consists of three sites: the National Center for Supercomputing Applications, the University of Mrginia and the NASA Langley Research Center.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom