TY - GEN
T1 - Efficient star join for column-oriented data store in the MapReduce environment
AU - Zhu, Haitong
AU - Zhou, Minqi
AU - Xia, Fan
AU - Zhou, Aoying
PY - 2011
Y1 - 2011
N2 - MapReduce is a parallel computing paradigm that has gained a lot of attention from both industry and academia recent years. Unlike parallel DBMSs, with MapReduce, it is easier for non-expert to develop scalable parallel programs for analytical applications over huge data sets across clusters of commodity machines. As the nature of scan-oriented processing, the performance of MapReduce for relation operators can be enhanced dramatically since it is inevitably accessing lots of unnecessary data tuples, especially for table join operators. In this paper, we propose an efficient star join strategy called HdBmp join for column-oriented data store by using a three-level content aware index (i.e., HdBmp Index). Armed with this index, most of the unnecessary tuples in the join processing can be filtered out, and consequently result in immense reduction in both communication cost and execution time. Our extensive experimental studies confirm the efficiency, scalability and effectiveness of our new proposed join methods.
AB - MapReduce is a parallel computing paradigm that has gained a lot of attention from both industry and academia recent years. Unlike parallel DBMSs, with MapReduce, it is easier for non-expert to develop scalable parallel programs for analytical applications over huge data sets across clusters of commodity machines. As the nature of scan-oriented processing, the performance of MapReduce for relation operators can be enhanced dramatically since it is inevitably accessing lots of unnecessary data tuples, especially for table join operators. In this paper, we propose an efficient star join strategy called HdBmp join for column-oriented data store by using a three-level content aware index (i.e., HdBmp Index). Armed with this index, most of the unnecessary tuples in the join processing can be filtered out, and consequently result in immense reduction in both communication cost and execution time. Our extensive experimental studies confirm the efficiency, scalability and effectiveness of our new proposed join methods.
KW - Column store
KW - HdBmp index
KW - HdBmp join
KW - Star join
UR - https://www.scopus.com/pages/publications/84055190693
U2 - 10.1109/WISA.2011.10
DO - 10.1109/WISA.2011.10
M3 - 会议稿件
AN - SCOPUS:84055190693
SN - 9780769545554
T3 - Proceedings -8th Web Information Systems and Applications Conference, WISA 2011, Workshop on Semantic Web and Ontology,SWON 2011,Workshop on Electronic Government Technology and Application,EGTA 2011
SP - 13
EP - 18
BT - Proceedings -8th Web Information Systems and Applications Conference,WISA 2011,Workshop on Semantic Web and Ontology, SWON 2011 Workshop on Electronic Government Technology and Application, EGTA 2011
T2 - 8th Web Information Systems and Applications Conference, WISA 2011, Workshop on Semantic Web and Ontology, SWON 2011, Workshop on Electronic Government Technology and Application, EGTA 2011
Y2 - 21 October 2011 through 23 October 2011
ER -