Fault Tolerance in a Mobile Agent Based Computational Grid
Author(s) -
Rafael Fernandes Lopes,
Francisco Jose da Silva e Silva
Publication year - 2006
Publication title -
sixth ieee international symposium on cluster computing and the grid (ccgrid'06)
Language(s) - English
Resource type - Book series
ISBN - 0-7695-2585-7
DOI - 10.1109/ccgrid.2006.137
In recent years, Grid computing has emerged as a promising alternative to increase the capacity of processing and storage, through integration and sharing of multiinstitutional resources. Fault tolerance is an essential characteristic for Grid environments. As the Grid acts as a massively parallel system, the loss of computation time must be avoided. In fact, the likelihood of errors occurring may be exacerbated by the fact that many Grid applications will perform long tasks that may require several days of computation. In this paper, we describe the fault tolerance mechanism of the MAG Grid middleware. We describe the fault tolerance components and how they interact with each other. The components were developed as mobile agents, forming a multiagent society providing fault tolerance for node and application crashes.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom