跳到主要导航 跳到搜索 跳到主要内容

Scheduling algorithm based on prefetching in MapReduce clusters

  • Mingming Sun*
  • , Hang Zhuang
  • , Changlong Li
  • , Kun Lu
  • , Xuehai Zhou
  • *此作品的通讯作者
  • University of Science and Technology of China

科研成果: 期刊稿件文章同行评审

摘要

Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes for future map tasks based on current pending tasks and then preload the needed data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters.

源语言英语
页(从-至)1109-1118
页数10
期刊Applied Soft Computing
38
DOI
出版状态已出版 - 1月 2016
已对外发布

指纹

探究 'Scheduling algorithm based on prefetching in MapReduce clusters' 的科研主题。它们共同构成独一无二的指纹。

引用此