Optimizing data placement of mapreduce on ceph-based framework under load-balancing constraint

Edwin H.M. Sha, Yutong Liang, Weiwen Jiang, Xianzhang Chen, Qingfeng Zhuge

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Ceph has been widely used as a distributed object store and file system due to its high availability, reliability and scalability. Strategies of data placements in Ceph composed of heterogeneous clusters can greatly affect the system performance and load balancing. For a given application, it is critical to find the optimal data placement in Ceph, such that the completion time of the application can be minimized under the load-balancing constraint. This paper presents a novel Ceph-based framework that integrally considers the load balancing and the heterogeneities, including the computational capacity and the network bandwidth. The presented framework is suitable for the applications based on the principle of moving computation rather than data across clusters, such as MapReduce. According to the Ceph-based framework and the properties of MapReduce, we formulate the Mixed Integer Linear Programming (MILP) to obtain the optimal data placement. However, because of the large computational complexity of MILP, we devise an efficient algorithm to obtain the near-optimal solutions. The experimental results show that the proposed algorithm can achieve up to 25.6% improvement on system performance, compared with the original strategy implemented in Ceph.

Original languageEnglish
Title of host publicationProceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
EditorsXiaofei Liao, Robert Lovas, Xipeng Shen, Ran Zheng
PublisherIEEE Computer Society
Pages585-592
Number of pages8
ISBN (Electronic)9781509044573
DOIs
StatePublished - 2 Jul 2016
Externally publishedYes
Event22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016 - Wuhan, Hubei, China
Duration: 13 Dec 201616 Dec 2016

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume0
ISSN (Print)1521-9097

Conference

Conference22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
Country/TerritoryChina
CityWuhan, Hubei
Period13/12/1616/12/16

Keywords

  • Ceph
  • Data placement
  • Framework
  • Load balancing
  • Object storage

Fingerprint

Dive into the research topics of 'Optimizing data placement of mapreduce on ceph-based framework under load-balancing constraint'. Together they form a unique fingerprint.

Cite this