TY - JOUR
T1 - DFMapper:基于查询树的SQL-to-HiveQL自动翻译工具
AU - Zhang, Chengbo
AU - Yu, Huiqun
AU - Guo, Jianmei
AU - Yang, Dingyu
AU - Fan, Guisheng
N1 - Publisher Copyright:
© 2019, Editorial Department of Journal of East China University of Science and Technology. All right reserved.
PY - 2019/2/28
Y1 - 2019/2/28
N2 - Due to the increasing amount of data in storing and processing, the traditional RDBMS encounters performance bottleneck. As a data warehouse built on Hadoop for providing data analysis and summarization as a right alternative to the traditional RDBMS, Hive becomes the first choice for many enterprises to deal with big data for its massive scale out and fault tolerance capabilities. In traditional enterprises particularly, a wide variety of legacy applications depend on the traditional RDBMS. Therefore, when migrating these applications to Hive necessary, a large number of queries need to be translated, which will consume hug cost of labor and time via manual way. This paper proposes a query-tree based approach for automatically translating SQL in RDBMS into proper HiveQL. The SQL parser is applied to parsing SQL sentence to query trees that will be supplied with correspondence between tables and columns during pretreatment. By taking into account of set operations, correlated subqueries, and other structures that HiveQL support weakly, this paper proposes eight different rewriting strategies to reconstruct query trees, and in turn to transform those queries in HiveQL sentences. A translation tool called DFMapper may provide a strategies loader to dynamically adjust the specific strategies according to actual requirements, e.g., the version of Hive, SQL dialect, etc., via the modification of externalized configuration. Besides, a validator is designed to verify the accuracy of translation by comparing the result sets of queries executed in RDBMS and Hive, respectively. It is demonstrated via the experiments on the TPC-DS benchmark composed of 99 different queries and covering a varity of ANSI SQL syntax that DFMapper can correctly translate the vast majority of universal queries with strong extensibility.
AB - Due to the increasing amount of data in storing and processing, the traditional RDBMS encounters performance bottleneck. As a data warehouse built on Hadoop for providing data analysis and summarization as a right alternative to the traditional RDBMS, Hive becomes the first choice for many enterprises to deal with big data for its massive scale out and fault tolerance capabilities. In traditional enterprises particularly, a wide variety of legacy applications depend on the traditional RDBMS. Therefore, when migrating these applications to Hive necessary, a large number of queries need to be translated, which will consume hug cost of labor and time via manual way. This paper proposes a query-tree based approach for automatically translating SQL in RDBMS into proper HiveQL. The SQL parser is applied to parsing SQL sentence to query trees that will be supplied with correspondence between tables and columns during pretreatment. By taking into account of set operations, correlated subqueries, and other structures that HiveQL support weakly, this paper proposes eight different rewriting strategies to reconstruct query trees, and in turn to transform those queries in HiveQL sentences. A translation tool called DFMapper may provide a strategies loader to dynamically adjust the specific strategies according to actual requirements, e.g., the version of Hive, SQL dialect, etc., via the modification of externalized configuration. Besides, a validator is designed to verify the accuracy of translation by comparing the result sets of queries executed in RDBMS and Hive, respectively. It is demonstrated via the experiments on the TPC-DS benchmark composed of 99 different queries and covering a varity of ANSI SQL syntax that DFMapper can correctly translate the vast majority of universal queries with strong extensibility.
KW - Application migration
KW - HiveQL
KW - Query rewrite
KW - Query-tree
UR - https://www.scopus.com/pages/publications/85064737783
U2 - 10.14135/j.cnki.1006-3080.20171126002
DO - 10.14135/j.cnki.1006-3080.20171126002
M3 - 文章
AN - SCOPUS:85064737783
SN - 1006-3080
VL - 45
SP - 148
EP - 155
JO - Huadong Ligong Daxue Xuebao /Journal of East China University of Science and Technology
JF - Huadong Ligong Daxue Xuebao /Journal of East China University of Science and Technology
IS - 1
ER -