Optimized data placement for column-oriented data store in the distributed environment

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Column-oriented data storage becomes a buzzword nowadays for its high efficiency in massive data access, high compression ratio on individual columns and etc. However, the initial observations turn out to not be trivially true. The seek time and bandwidth of current hard disk drivers (HDD) become the bottleneck for massive data processing day by day, when comparing to other component enhancements of computers during the past four decades. In this paper, we provide a novel data placement strategy for massive data analysis (i.e., read-optimized) based on Gray Code, which enhances the ratio of sequential access to a great extent for diverse query evaluations (e.g., range query, partial match range query, aggregation query and etc). A centralized/distributed structured index is employed in the popularly deployed distributed file systems (e.g., GFS), which achieves the convenient management, efficient accessibility, high extendibility and etc. Detailed theoretical analysis on index extendibility, sequential access improvement and storage capacity usage in terms of proposed data placement strategies are provided as well as specific algorithms. Our extensive experimental studies confirm the efficiency and effectiveness of our proposed data placement methods.

Original languageEnglish
Title of host publicationDatabase Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops
Subtitle of host publicationGDB, SIM3, FlashDB, SNSMW, DaMEN, DQIS, Proceedings
EditorsJianliang Xu, Ge Yu, Shuigeng Zhou, Rainer Unland
PublisherSpringer Verlag
Pages440-452
Number of pages13
ISBN (Print)9783642202438
DOIs
StatePublished - 2011
Event16th International Conference on Database Systems for Advanced Applications, DASFAA 2011 - Hong Kong, China
Duration: 22 Apr 201125 Apr 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6637 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Conference on Database Systems for Advanced Applications, DASFAA 2011
Country/TerritoryChina
CityHong Kong
Period22/04/1125/04/11

Fingerprint

Dive into the research topics of 'Optimized data placement for column-oriented data store in the distributed environment'. Together they form a unique fingerprint.

Cite this