TY - JOUR
T1 - An efficient bulk loading approach of secondary index in distributed log-structured data stores
AU - Zhu, Yanchao
AU - Zhang, Zhao
AU - Cai, Peng
AU - Qian, Weining
AU - Zhou, Aoying
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - How to improve reading performance of Log-Structured-Merge (LSM)-tree gains much attention recently. Meanwhile, constructing secondary index for LSM data stores is a popular solution. And bulk loading of secondary index is inevitable when a new application is developed on an existing LSM data stores. However, to the best of our knowledge there are few studies on research of bulk loading of secondary index in distributed LSM-tree. In this paper, we study the performance improvement of bulk loading of secondary index in distributed LSM-tree data stores. We propose an efficient bulk loading approach of secondary index in Log-Structured Data Stores. Firstly, we design secondary index structure based on distributed LSM-tree to guarantee the scalability and consistency of secondary index. Secondly, we propose an efficient framework to handle bulk loading of secondary index in a distributed environment, which can provide a good load balancing for query processing by using equal-depth histogram to capture data distribution. Analysis of theoretical and experimental results on standard benchmark illustrate the efficacy of the proposed methods in a distributed environment.
AB - How to improve reading performance of Log-Structured-Merge (LSM)-tree gains much attention recently. Meanwhile, constructing secondary index for LSM data stores is a popular solution. And bulk loading of secondary index is inevitable when a new application is developed on an existing LSM data stores. However, to the best of our knowledge there are few studies on research of bulk loading of secondary index in distributed LSM-tree. In this paper, we study the performance improvement of bulk loading of secondary index in distributed LSM-tree data stores. We propose an efficient bulk loading approach of secondary index in Log-Structured Data Stores. Firstly, we design secondary index structure based on distributed LSM-tree to guarantee the scalability and consistency of secondary index. Secondly, we propose an efficient framework to handle bulk loading of secondary index in a distributed environment, which can provide a good load balancing for query processing by using equal-depth histogram to capture data distribution. Analysis of theoretical and experimental results on standard benchmark illustrate the efficacy of the proposed methods in a distributed environment.
KW - Distributed bulk loading
KW - Load balancing
KW - Secondary index
UR - https://www.scopus.com/pages/publications/85032307431
U2 - 10.1007/978-3-319-55753-3_6
DO - 10.1007/978-3-319-55753-3_6
M3 - 会议文章
AN - SCOPUS:85032307431
SN - 0302-9743
VL - 10177 LNCS
SP - 87
EP - 102
JO - Lecture Notes in Computer Science
JF - Lecture Notes in Computer Science
T2 - 22nd International Conference on Database Systems for Advanced Applications, DASFAA 2017
Y2 - 27 March 2017 through 30 March 2017
ER -