TY - JOUR
T1 - Smart Intra-query Fault Tolerance for Massive Parallel Processing Databases
AU - Ji, Yunhong
AU - Chai, Yunpeng
AU - Zhou, Xuan
AU - Ren, Lipeng
AU - Qin, Yajie
N1 - Publisher Copyright:
© 2019, The Author(s).
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Intra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.
AB - Intra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.
KW - Fault tolerance
KW - Intra-query fault tolerance
KW - Massive parallel processing databases
KW - Pipeline
UR - https://www.scopus.com/pages/publications/85077017601
U2 - 10.1007/s41019-019-00114-z
DO - 10.1007/s41019-019-00114-z
M3 - 文章
AN - SCOPUS:85077017601
SN - 2364-1185
VL - 5
SP - 65
EP - 79
JO - Data Science and Engineering
JF - Data Science and Engineering
IS - 1
ER -