Reducing the de-linearization of data placement to improve deduplication performance

  • Yujuan Tan*
  • , Zhichao Yan
  • , Dan Feng
  • , E. H.M. Sha
  • , Xiongzi Ge
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Data deduplication is a lossless compression technology that replaces the redundant data chunks with pointers pointing to the already-stored ones. Due to this intrinsic data elimination feature, the deduplication commodity would delinearize the data placement and force the data chunks that belong to the same data object to be divided into multiple separate parts. In our preliminary study, it is found that the de-linearization of the data placement would weaken the data spatial locality that is used for improving data read performance, deduplication throughput and efficiency in some deduplication approaches, which significantly affects the deduplication performance. In this paper, we first analyze the negative effect of the de-linearization of data placement to the data deduplication performance with some examples and experimental evidences, and then propose an effective approach to reduce the de-linearization of data placement by sacrificing little compression ratios. The experimental evaluation driven by the real world datasets shows that our proposed approach effectively reduces the de-linearization of the data placement and enhances the data spatial locality, which significantly improves the deduplication performances including deduplication throughput, deduplication efficiency and data read performance, while at the cost of little compression ratios.

Original languageEnglish
Title of host publicationProceedings - 2012 SC Companion
Subtitle of host publicationHigh Performance Computing, Networking Storage and Analysis, SCC 2012
Pages796-800
Number of pages5
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012 - Salt Lake City, UT, United States
Duration: 10 Nov 201216 Nov 2012

Publication series

NameProceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Conference

Conference2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
Country/TerritoryUnited States
CitySalt Lake City, UT
Period10/11/1216/11/12

Keywords

  • data deduplication
  • data fragmentation
  • data placement
  • spatial locality

Fingerprint

Dive into the research topics of 'Reducing the de-linearization of data placement to improve deduplication performance'. Together they form a unique fingerprint.

Cite this