Premium
Optimization of checkpointing/recovery strategy in cloud computing with adaptive storage management
Author(s) -
Meroufel Bakhta,
Belalem Ghalem
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4906
Subject(s) - cloud computing , computer science , distributed computing , fault tolerance , replication (statistics) , reliability (semiconductor) , task (project management) , node (physics) , crash , operating system , engineering , power (physics) , statistics , physics , mathematics , systems engineering , structural engineering , quantum mechanics
Summary Cloud Computing is a type of distributed system that is usually based on the services offered to the user based on SLA contract. In this case, the implementation of a fault‐tolerant system that ensures the reliability and the services continuity becomes a major requirement. In this paper, we propose a fault tolerance strategy based on checkpointing and replication. Our approach uses a smart checkpoint infrastructure for cloud computing tasks. The checkpoints are stored in alternative already paid VMs. This allows resuming a task execution faster and cheaper after a node crash. Since checkpoints are distributed and replicated, our approach increases also the system reliability. The experimental results show the effectiveness of the proposed strategy in term of energy consumption, SLA (System Level Aggregation) violation, and reliability.