Exploiting Unblocking Checkpoint for Fault-Tolerance in Pregel-Like Systems

  • Yi Yang
  • , Zhenhua Yang
  • , Chen Xu*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

With the explosive growth of graph size, a series of Pregel-like systems have emerged. Typically, these systems employ checkpointing and rollback mechanisms to achieve fault-tolerance in either blocking or unblocking manner. The blocking checkpointing pauses the iterative processing while checkpointing, whereas the unblocking checkpointing writes the checkpoints in parallel with the iterative processing. The unblocking checkpointing decreases the checkpointing overhead, but incurs resource contention due to checkpointing concurrently. Hence, it may prolong the time on execution and checkpointing. In this work, we propose a queuing strategy to alleviate the contention. This strategy employs a checkpoint queue to store all the pending checkpoints, which allows to concurrently write a certain number of checkpoints at most from the queue following a First-In-First-Out (FIFO) policy. To further utilize the characteristics of checkpoint in Pregel-like systems, we define checkpoint staleness and checkpoint tardiness, and then propose staleness/tardiness-aware skipping policy to replace the FIFO policy. Extensive experiments verified that the queuing strategy with the skipping policy outperforms blocking and unblocking checkpointing in Pregel-like systems.

Original languageEnglish
Title of host publicationWeb Information Systems Engineering - WISE 2021 - 22nd International Conference on Web Information Systems Engineering, WISE 2021, Proceedings
EditorsWenjie Zhang, Lei Zou, Zakaria Maamar, Lu Chen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages71-86
Number of pages16
ISBN (Print)9783030908874
DOIs
StatePublished - 2021
Event22nd International Conference on Web Information Systems Engineering, WISE 2021 - Melbourne, Australia
Duration: 26 Oct 202129 Oct 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13080 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Web Information Systems Engineering, WISE 2021
Country/TerritoryAustralia
CityMelbourne
Period26/10/2129/10/21

Keywords

  • Checkpoint
  • Fault tolerance
  • Graph processing

Fingerprint

Dive into the research topics of 'Exploiting Unblocking Checkpoint for Fault-Tolerance in Pregel-Like Systems'. Together they form a unique fingerprint.

Cite this