A comprehensive study on fault tolerance in stream processing systems

Xiaotong Wang, Chunxi Zhang, Junhua Fang, Rong Zhang*, Weining Qian, Aoying Zhou

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

6 Scopus citations

Abstract

Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data. Since stream processing systems (SPSs) usually require distributed deployment on clusters of servers in face of large-scale of data, it is especially common to meet with failures of processing nodes or communication networks, but should be handled seriously considering service quality. A failed system may produce wrong results or become unavailable, resulting in a decline in user experience or even significant financial loss. Hence, a large amount of fault tolerance approaches have been proposed for SPSs. These approaches often have their own priorities on specific performance concerns, e.g., runtime overhead and recovery efficiency. Nevertheless, there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs, which will become an obstacle for the development of SPSs. Therefore, we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs. Furthermore, we propose an evaluation framework tailored for fault tolerance, demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs. Finally, we specify future research directions in this domain.

Original languageEnglish
Article number162603
JournalFrontiers of Computer Science
Volume16
Issue number2
DOIs
StatePublished - Apr 2022

Keywords

  • fault tolerance
  • performance evaluation
  • stream processing

Fingerprint

Dive into the research topics of 'A comprehensive study on fault tolerance in stream processing systems'. Together they form a unique fingerprint.

Cite this