HPSO: Prefetching based scheduling to improve data locality for MapReduce clusters

  • Mingming Sun
  • , Hang Zhuang
  • , Xuehai Zhou
  • , Kun Lu
  • , Changlong Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes to which future map tasks should be assigned and then preload the input data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters.

Original languageEnglish
Title of host publicationAlgorithms and Architectures for Parallel Processing - 14th International Conference, ICA3PP 2014, Proceedings
PublisherSpringer Verlag
Pages82-95
Number of pages14
EditionPART 2
ISBN (Print)9783319111933
DOIs
StatePublished - 2014
Externally publishedYes
Event14th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2014 - Dalian, China
Duration: 24 Aug 201427 Aug 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume8631 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2014
Country/TerritoryChina
CityDalian
Period24/08/1427/08/14

Keywords

  • Data locality
  • MapReduce clusters
  • prefetching
  • task scheduler

Fingerprint

Dive into the research topics of 'HPSO: Prefetching based scheduling to improve data locality for MapReduce clusters'. Together they form a unique fingerprint.

Cite this