TY - GEN
T1 - Hybrid Checkpointing for Iterative Processing in BSP-Based Systems
AU - Yang, Yi
AU - Xu, Chen
AU - Kong, Chao
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Distributed iterative processing exists in various application scenarios including large-scale graph analytics and machine learning. Many systems employ bulk synchronous parallel (BSP) model to synchronize the iterations. In these BSP-based systems, the long iterative processing time in distributed environments makes the fault-tolerance crucial. Most BSP-based systems write a checkpoint in either blocking strategy or unblocking strategy to achieve fault-tolerance. However, the blocking strategy involves a checkpointing overhead in failure-free cases, whereas the unblocking strategy also incurs a recovery cost if the BSP-based system has not completed checkpointing in failure cases. Motivated by the trade-off between blocking and unblocking checkpointing, we aim to choose different checkpointing strategy when checkpoint is required during iterative processing, in order to reduce the whole execution time. In particular, we propose a checkpointing choice problem, i.e., how to choose the strategy to minimize the execution time. The challenge is to make a choice during runtime without future information. To address this problem, we provide a hybrid checkpointing, which heuristically chooses either blocking or unblocking checkpointing based on cost evaluation. Our experiments on Giraph, a typical BSP-based system, show that hybrid checkpointing outperforms blocking and unblocking checkpointing.
AB - Distributed iterative processing exists in various application scenarios including large-scale graph analytics and machine learning. Many systems employ bulk synchronous parallel (BSP) model to synchronize the iterations. In these BSP-based systems, the long iterative processing time in distributed environments makes the fault-tolerance crucial. Most BSP-based systems write a checkpoint in either blocking strategy or unblocking strategy to achieve fault-tolerance. However, the blocking strategy involves a checkpointing overhead in failure-free cases, whereas the unblocking strategy also incurs a recovery cost if the BSP-based system has not completed checkpointing in failure cases. Motivated by the trade-off between blocking and unblocking checkpointing, we aim to choose different checkpointing strategy when checkpoint is required during iterative processing, in order to reduce the whole execution time. In particular, we propose a checkpointing choice problem, i.e., how to choose the strategy to minimize the execution time. The challenge is to make a choice during runtime without future information. To address this problem, we provide a hybrid checkpointing, which heuristically chooses either blocking or unblocking checkpointing based on cost evaluation. Our experiments on Giraph, a typical BSP-based system, show that hybrid checkpointing outperforms blocking and unblocking checkpointing.
KW - BSP
KW - Checkpointing
KW - Iterative processing
UR - https://www.scopus.com/pages/publications/85115855633
U2 - 10.1007/978-3-030-87571-8_60
DO - 10.1007/978-3-030-87571-8_60
M3 - 会议稿件
AN - SCOPUS:85115855633
SN - 9783030875701
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 693
EP - 705
BT - Web Information Systems and Applications - 18th International Conference, WISA 2021, Proceedings
A2 - Xing, Chunxiao
A2 - Fu, Xiaoming
A2 - Zhang, Yong
A2 - Zhang, Guigang
A2 - Borjigin, Chaolemen
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th International Conference on Web Information Systems and Applications, WISA 2021
Y2 - 24 September 2021 through 26 September 2021
ER -