TY - GEN
T1 - Optimized data placement for column-oriented data store in the distributed environment
AU - Zhou, Minqi
AU - Xu, Chen
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2011.
PY - 2011
Y1 - 2011
N2 - Column-oriented data storage becomes a buzzword nowadays for its high efficiency in massive data access, high compression ratio on individual columns and etc. However, the initial observations turn out to not be trivially true. The seek time and bandwidth of current hard disk drivers (HDD) become the bottleneck for massive data processing day by day, when comparing to other component enhancements of computers during the past four decades. In this paper, we provide a novel data placement strategy for massive data analysis (i.e., read-optimized) based on Gray Code, which enhances the ratio of sequential access to a great extent for diverse query evaluations (e.g., range query, partial match range query, aggregation query and etc). A centralized/distributed structured index is employed in the popularly deployed distributed file systems (e.g., GFS), which achieves the convenient management, efficient accessibility, high extendibility and etc. Detailed theoretical analysis on index extendibility, sequential access improvement and storage capacity usage in terms of proposed data placement strategies are provided as well as specific algorithms. Our extensive experimental studies confirm the efficiency and effectiveness of our proposed data placement methods.
AB - Column-oriented data storage becomes a buzzword nowadays for its high efficiency in massive data access, high compression ratio on individual columns and etc. However, the initial observations turn out to not be trivially true. The seek time and bandwidth of current hard disk drivers (HDD) become the bottleneck for massive data processing day by day, when comparing to other component enhancements of computers during the past four decades. In this paper, we provide a novel data placement strategy for massive data analysis (i.e., read-optimized) based on Gray Code, which enhances the ratio of sequential access to a great extent for diverse query evaluations (e.g., range query, partial match range query, aggregation query and etc). A centralized/distributed structured index is employed in the popularly deployed distributed file systems (e.g., GFS), which achieves the convenient management, efficient accessibility, high extendibility and etc. Detailed theoretical analysis on index extendibility, sequential access improvement and storage capacity usage in terms of proposed data placement strategies are provided as well as specific algorithms. Our extensive experimental studies confirm the efficiency and effectiveness of our proposed data placement methods.
UR - https://www.scopus.com/pages/publications/84886017996
U2 - 10.1007/978-3-642-20244-5_42
DO - 10.1007/978-3-642-20244-5_42
M3 - 会议稿件
AN - SCOPUS:84886017996
SN - 9783642202438
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 440
EP - 452
BT - Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops
A2 - Xu, Jianliang
A2 - Yu, Ge
A2 - Zhou, Shuigeng
A2 - Unland, Rainer
PB - Springer Verlag
T2 - 16th International Conference on Database Systems for Advanced Applications, DASFAA 2011
Y2 - 22 April 2011 through 25 April 2011
ER -