Evaluating Fault Tolerance of Distributed Stream Processing Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Since failures in large-scale clusters can lead to severe performance degradation and break system availability, fault tolerance is critical for distributed stream processing systems (DSPSs). Plenty of fault tolerance approaches have been proposed over the last decade. However, there is no systematic work to evaluate and compare them in detail. Previous work either evaluates global performance during failure-free runtime, or merely measures throughout loss when failure happens. In this paper, it is the first work proposing an evaluation framework customized for quantitatively comparing runtime overhead and recovery efficiency of fault tolerance mechanisms in DSPSs. We define three typical configurable workloads, which are widely-adopted in previous DSPS evaluations. We construct five workload suites based on three workloads to investigate the effects of different factors on fault tolerance performance. We carry out extensive experiments on two well-known open-sourced DSPSs. The results demonstrate performance gap of two systems, which is useful for choice and evolution of fault tolerance approaches.

Original languageEnglish
Title of host publicationWeb and Big Data - 4th International Joint Conference, APWeb-WAIM 2020, Proceedings
EditorsXin Wang, Rui Zhang, Young-Koo Lee, Le Sun, Yang-Sae Moon
PublisherSpringer Science and Business Media Deutschland GmbH
Pages101-116
Number of pages16
ISBN (Print)9783030602895
DOIs
StatePublished - 2020
Event4th Asia-Pacific Web and Web-Age Information Management, Joint Conference on Web and Big Data, APWeb-WAIM 2020 - Tianjin, China
Duration: 18 Sep 202020 Sep 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12318 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th Asia-Pacific Web and Web-Age Information Management, Joint Conference on Web and Big Data, APWeb-WAIM 2020
Country/TerritoryChina
CityTianjin
Period18/09/2020/09/20

Keywords

  • Benchmarking
  • Fault tolerance
  • Stream processing

Fingerprint

Dive into the research topics of 'Evaluating Fault Tolerance of Distributed Stream Processing Systems'. Together they form a unique fingerprint.

Cite this