TY - GEN
T1 - Optimistic recovery for iterative dataflows in action
AU - Dudoladov, Sergey
AU - Xu, Chen
AU - Schelter, Sebastian
AU - Katsifodimos, Asterios
AU - Ewen, Stephan
AU - Tzoumas, Kostas
AU - Markl, Volker
N1 - Publisher Copyright:
Copyright 2015 ACM.
PY - 2015/5/27
Y1 - 2015/5/27
N2 - Over the past years, parallel dataflow systems have been employed for advanced analytics in the field of data mining where many algorithms are iterative. These systems typically provide fault tolerance by periodically checkpointing the algorithm's state and, in case of failure, restoring a consistent state from a checkpoint. In prior work, we presented an optimistic recovery mechanism that in certain cases eliminates the need to checkpoint the intermediate state of an iterative algorithm. In case of failure, our mechanism uses a compensation function to transit the algorithm to a consistent state, from which the execution can continue and successfully converge. Since this recovery mechanism does not checkpoint any state, it achieves optimal failure-free performance while guaranteeing fault tolerance. In this paper, we demonstrate our recovery mechanism with the Apache Flink data processing engine. During our demonstration, attendees will be able to run graph algorithms and trigger failures to observe the algorithms recovering with compensation functions instead of checkpoints.
AB - Over the past years, parallel dataflow systems have been employed for advanced analytics in the field of data mining where many algorithms are iterative. These systems typically provide fault tolerance by periodically checkpointing the algorithm's state and, in case of failure, restoring a consistent state from a checkpoint. In prior work, we presented an optimistic recovery mechanism that in certain cases eliminates the need to checkpoint the intermediate state of an iterative algorithm. In case of failure, our mechanism uses a compensation function to transit the algorithm to a consistent state, from which the execution can continue and successfully converge. Since this recovery mechanism does not checkpoint any state, it achieves optimal failure-free performance while guaranteeing fault tolerance. In this paper, we demonstrate our recovery mechanism with the Apache Flink data processing engine. During our demonstration, attendees will be able to run graph algorithms and trigger failures to observe the algorithms recovering with compensation functions instead of checkpoints.
KW - Fault-tolerance
KW - Iterative algorithms
KW - Optimistic recovery
UR - https://www.scopus.com/pages/publications/84957567949
U2 - 10.1145/2723372.2735372
DO - 10.1145/2723372.2735372
M3 - 会议稿件
AN - SCOPUS:84957567949
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1439
EP - 1443
BT - SIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - ACM SIGMOD International Conference on Management of Data, SIGMOD 2015
Y2 - 31 May 2015 through 4 June 2015
ER -