Premium
Checkpointing schemes for Grid workflow systems
Author(s) -
Li Zhongwen,
Xiang Yang
Publication year - 2008
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.1321
Subject(s) - computer science , workflow , distributed computing , task (project management) , fault tolerance , grid , execution time , focus (optics) , fault (geology) , parallel computing , real time computing , database , physics , geometry , mathematics , management , seismology , optics , economics , geology
One of the major challenges in wide use of Grid workflow systems is fault tolerance and avoidance. Checkpointing schemes provide a way of fault detection and recovery. In our research, we focus on the performance optimization of checkpointing schemes and dynamic voltage scaling (DVS) for Grid workflow systems. We propose offline checkpointing schemes with DVS and online adaptive checkpointing schemes that dynamically adjust the checkpointing intervals by using store checkpoints and compare checkpoints. When combined with DVS, offline adaptive checkpointing schemes not only are fault tolerant but also lead to reduce average execution time of tasks. These schemes can efficiently utilize comparison and storage operations and significantly improve the performance. Further, these schemes can calculate the optimal numbers of checkpoints by which the mean execution time can be minimized. We also expand the online adaptive checkpointing schemes from single‐task execution scenarios to multi‐task execution scenarios. Simulation results show that these online schemes outstandingly increase the likelihood of timely task completion when faults occur. Copyright © 2008 John Wiley & Sons, Ltd.