z-logo
Premium
FATM: A failure‐aware adaptive fault tolerance model for distributed stream processing systems
Author(s) -
Akber Syed Muhammad Abrar,
Chen Hanhua,
Jin Hai
Publication year - 2021
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.6167
Subject(s) - computer science , latency (audio) , fault tolerance , stream processing , distributed computing , process (computing) , low latency (capital markets) , embedded system , computer network , operating system , telecommunications
Summary Distributed Stream Processing Systems (DSPS) are very popular to process unbounded data streams in real‐time. Low processing latency is a fundamental requirement for DSPS applications to maintain the real‐time response. This requirement of low processing latency for DSPS is badly affected due to inevitable failures in computing systems. Generally, DSPS grapple with these inevitable failures by triggering periodic checkpoints. The periodic checkpoints pessimistically persist the application state so that the execution may be resumed after the failure. These periodic checkpoints incur high overheads due to the high frequency of checkpoints triggering, which increases the overall execution time. On the other hand, failure occurrences in real‐world systems are not periodic. This sharp contrast between the periodic checkpoints and failure distributions in the real‐world systems makes the periodic checkpoints inefficient. We propose a failure‐aware adaptive fault tolerance model called FATM which triggers the checkpoints inline with the underlying failure rate. Further, we design a model for utility factor and checkpoint overheads to evaluate the performance of fault tolerance models for DSPS. We implement the FATM atop Apache Flink and perform a series of experiments. To validate the effectiveness of FATM, experiment results are compared with the existing checkpoint‐based models of DSPS. The results show that the FATM significantly reduces the checkpoint frequency, increases the utility factor, and reduces the checkpoint overheads by 28%.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here