跳到主要导航 跳到搜索 跳到主要内容

ACF2: Accelerating Checkpoint-Free Failure Recovery for Distributed Graph Processing

  • Chen Xu*
  • , Yi Yang
  • , Qingfeng Pan
  • , Hongfu Zhou
  • *此作品的通讯作者
  • Shanghai Engineering Research Center of Big Data Management
  • East China Normal University
  • Shanghai Ruanzhong Information Technology Company Limited

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Iterative computation in distributed graph processing systems typically incurs a long runtime. Hence, it is crucial for graph processing to tolerate and quick recover from intermittent failures. Existing solutions can be categorized into checkpoint-based and checkpoint-free solution. The former writes checkpoints periodically during execution, which leads to significant overhead. Differently, the latter requires no checkpoint. Once failure happens, it reloads input data and resets the value of lost vertices directly. However, reloading input data involves repartitioning, which incurs additional overhead. Moreover, we observe that checkpoint-free solution cannot effectively handle failures for graph algorithms with topological mutations. To address these issues, we propose ACF2 with a partition-aware backup strategy and an incremental protocol. In particular, the partition-aware backup strategy backs up the sub-graphs of all nodes after initial partitioning. Once failure happens, the partition-aware backup strategy recovers the lost sub-graphs from the backups, and then resumes computation like checkpoint-free solution. To effectively handle failures involving topological mutations, the incremental protocol logs topological mutations during normal execution which would be exploited for recovery. We implement ACF2 based on Apache Giraph and our experiments show that ACF2 significantly outperforms existing solutions.

源语言英语
主期刊名Web and Big Data - 6th International Joint Conference, APWeb-WAIM 2022, Proceedings
编辑Bohan Li, Chuanqi Tao, Lin Yue, Xuming Han, Diego Calvanese, Toshiyuki Amagasa
出版商Springer Science and Business Media Deutschland GmbH
45-59
页数15
ISBN(印刷版)9783031251573
DOI
出版状态已出版 - 2023
活动6th International Joint Conference on Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM), APWeb-WAIM 2022 - Nanjing, 中国
期限: 25 11月 202227 11月 2022

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13421 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议6th International Joint Conference on Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM), APWeb-WAIM 2022
国家/地区中国
Nanjing
时期25/11/2227/11/22

指纹

探究 'ACF2: Accelerating Checkpoint-Free Failure Recovery for Distributed Graph Processing' 的科研主题。它们共同构成独一无二的指纹。

引用此