HC-Store: putting MapReduce’s foot in two camps

  • Huiju Wang*
  • , Furong Li
  • , Xuan Zhou
  • , Yu Cao
  • , Xiongpai Qin
  • , Jidong Chen
  • , Shan Wang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

MapReduce is a popular framework for large-scale data analysis. As data access is critical for MapReduce’s performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storage model is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models — pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore (HC-store). Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store. We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload.

Original languageEnglish
Pages (from-to)859-871
Number of pages13
JournalFrontiers of Computer Science
Volume8
Issue number6
DOIs
StatePublished - 26 Nov 2014
Externally publishedYes

Keywords

  • Cost model
  • HC-store
  • Hadoop
  • MapReduce
  • PAX-store
  • column-store

Fingerprint

Dive into the research topics of 'HC-Store: putting MapReduce’s foot in two camps'. Together they form a unique fingerprint.

Cite this