Evaluating the impact of a coordinated checkpointing in distributed data streams processing systems using discrete event simulation
Author(s) -
Matheus Bernardelli de Moraes,
André Leon Sampaio Gradvohl
Publication year - 2020
Publication title -
revista brasileira de computação aplicada
Language(s) - English
Resource type - Journals
ISSN - 2176-6649
DOI - 10.5335/rbca.v12i2.10295
Subject(s) - computer science , distributed computing , dependability , stream processing , data stream mining , fault tolerance , complex event processing , data loss , latency (audio) , data stream , real time computing , process (computing) , data processing , discrete event simulation , data mining , computer network , database , simulation , telecommunications , software engineering , operating system
Coordinated Checkpointing is a fault-tolerance strategy proposed for Data Streams Processing systems, which handles a continuous, potentially unbounded flow of data under Quality of Service requirements. Although traditional in large-scale distributed systems, there is a lack of study on how a Coordinated Checkpointing may impact the stream processing in both failure-free and failure-prone environments, especially considering the inherent requirement of analyzing and processing data in real-time. This paper presents a study that used a discrete simulation model to investigate the impacts of the Coordinated Checkpoint fault tolerance strategy on a Data Stream Processing System. The results show Coordinated Checkpointing should be avoided since it critically impacts the stream processing and the real-time analyzes of data, increasing latency up to 120%, and discarding up 95% of the processing window during a global checkpoint when a rollback-recovery is required.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom