跳到主要导航 跳到搜索 跳到主要内容

Unbinds data and tasks to improving the Hadoop performance

  • Kun Lu*
  • , Dong Dai
  • , Xuehai Zhou
  • , Mingming Sun
  • , Changlong Li
  • , Hang Zhuang
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Hadoop is a popular framework that provides easy programming interface of parallel programs to process large scale of data on clusters of commodity machines. Data intensive programs are the important part running on the cluster especially in large scale machine learning algorithm which executes of the same program iteratively. In-memory cache of input data is an efficient way to speed up these data intensive programs. However, we cannot be able to load all the data in memory because of the limitation of memory capacity. So, the key challenge is how we can accurately know when data should be cached in memory and when it ought to be released. The other problem is that memory capacity may even not enough to hold the input data of the running program. This leads to there is some data cannot be cached in memory. Prefetching is an effective method for such situation. We provide a unbinding technology which do not put the programs and data binded together before the real computation start. With unbinding technology, Hadoop can get a better performance when using caching and prefetching technology. We provide a Hadoop framework with unbinding technology named unbinding-Hadoop which decide the map tasks' input data in the map starting up phase, not at the job submission phase. Prefetching as well can be used in unbinding-Hadoop and can get better performance compared with the programs without unbinding. Evaluations on this system show that unbinding-Hadoop reduces the execution time of jobs by 40.2% and 29.2% with WordCount programs and K-means algorithm.

源语言英语
主期刊名2014 IEEE/ACIS 15th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2014 - Proceedings
编辑Satoshi Takahashi, Ju Yeon Jo
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9781479956043
DOI
出版状态已出版 - 2014
已对外发布
活动15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2014 - Las Vegas, 美国
期限: 30 6月 20142 7月 2014

出版系列

姓名2014 IEEE/ACIS 15th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2014 - Proceedings

会议

会议15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2014
国家/地区美国
Las Vegas
时期30/06/142/07/14

指纹

探究 'Unbinds data and tasks to improving the Hadoop performance' 的科研主题。它们共同构成独一无二的指纹。

引用此