跳到主要导航 跳到搜索 跳到主要内容

ShmStreaming: A shared memory approach for improving Hadoop streaming performance

  • Longbin Lai
  • , Jingyu Zhou
  • , Long Zheng
  • , Huakang Li
  • , Yanchao Lu
  • , Feilong Tang
  • , Minyi Guo
  • Shanghai Jiao Tong University
  • The University of Aizu

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The Map-Reduce programming model is now drawing both academic and industrial attentions for processing large data. Hadoop, one of the most popular implementations of the model, has been widely adopted. To support application programs written in languages other than Java, Hadoop introduces a streaming mechanism that allows it to communicate with external programs through pipes. Because of the added overhead associated with pipes and context switches, the performance of Hadoop streaming is significantly worse than native Hadoop jobs. We propose ShmStreaming, a mechanism that takes advantages of shared memory to realize Hadoop streaming for better performance. Specifically, ShmStreaming uses shared memory to implement a lockless FIFO queue that connects Hadoop and external programs. To further reduce the number of context switches, the FIFO queue adopts a batching technique to allow multiple key-value pairs to be processed together. For typical benchmarks of word count, grep and inverted index, experimental results show 20-30% performance improvement comparing to the native Hadoop streaming implementation.

源语言英语
主期刊名Proceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013
137-144
页数8
DOI
出版状态已出版 - 2013
已对外发布
活动27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013 - Barcelona, 西班牙
期限: 25 3月 201328 3月 2013

出版系列

姓名Proceedings - International Conference on Advanced Information Networking and Applications, AINA
ISSN(印刷版)1550-445X

会议

会议27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013
国家/地区西班牙
Barcelona
时期25/03/1328/03/13

指纹

探究 'ShmStreaming: A shared memory approach for improving Hadoop streaming performance' 的科研主题。它们共同构成独一无二的指纹。

引用此