DFMapper:基于查询树的SQL-to-HiveQL自动翻译工具

Translated title of the contribution: DFMapper: An Automatic Query-Tree Based SQL-to-HiveQL Translator
  • Chengbo Zhang
  • , Huiqun Yu*
  • , Jianmei Guo
  • , Dingyu Yang
  • , Guisheng Fan
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Due to the increasing amount of data in storing and processing, the traditional RDBMS encounters performance bottleneck. As a data warehouse built on Hadoop for providing data analysis and summarization as a right alternative to the traditional RDBMS, Hive becomes the first choice for many enterprises to deal with big data for its massive scale out and fault tolerance capabilities. In traditional enterprises particularly, a wide variety of legacy applications depend on the traditional RDBMS. Therefore, when migrating these applications to Hive necessary, a large number of queries need to be translated, which will consume hug cost of labor and time via manual way. This paper proposes a query-tree based approach for automatically translating SQL in RDBMS into proper HiveQL. The SQL parser is applied to parsing SQL sentence to query trees that will be supplied with correspondence between tables and columns during pretreatment. By taking into account of set operations, correlated subqueries, and other structures that HiveQL support weakly, this paper proposes eight different rewriting strategies to reconstruct query trees, and in turn to transform those queries in HiveQL sentences. A translation tool called DFMapper may provide a strategies loader to dynamically adjust the specific strategies according to actual requirements, e.g., the version of Hive, SQL dialect, etc., via the modification of externalized configuration. Besides, a validator is designed to verify the accuracy of translation by comparing the result sets of queries executed in RDBMS and Hive, respectively. It is demonstrated via the experiments on the TPC-DS benchmark composed of 99 different queries and covering a varity of ANSI SQL syntax that DFMapper can correctly translate the vast majority of universal queries with strong extensibility.

Translated title of the contributionDFMapper: An Automatic Query-Tree Based SQL-to-HiveQL Translator
Original languageChinese (Traditional)
Pages (from-to)148-155
Number of pages8
JournalHuadong Ligong Daxue Xuebao /Journal of East China University of Science and Technology
Volume45
Issue number1
DOIs
StatePublished - 28 Feb 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'DFMapper: An Automatic Query-Tree Based SQL-to-HiveQL Translator'. Together they form a unique fingerprint.

Cite this