ShmStreaming: A shared memory approach for improving Hadoop streaming performance

  • Longbin Lai
  • , Jingyu Zhou
  • , Long Zheng
  • , Huakang Li
  • , Yanchao Lu
  • , Feilong Tang
  • , Minyi Guo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

The Map-Reduce programming model is now drawing both academic and industrial attentions for processing large data. Hadoop, one of the most popular implementations of the model, has been widely adopted. To support application programs written in languages other than Java, Hadoop introduces a streaming mechanism that allows it to communicate with external programs through pipes. Because of the added overhead associated with pipes and context switches, the performance of Hadoop streaming is significantly worse than native Hadoop jobs. We propose ShmStreaming, a mechanism that takes advantages of shared memory to realize Hadoop streaming for better performance. Specifically, ShmStreaming uses shared memory to implement a lockless FIFO queue that connects Hadoop and external programs. To further reduce the number of context switches, the FIFO queue adopts a batching technique to allow multiple key-value pairs to be processed together. For typical benchmarks of word count, grep and inverted index, experimental results show 20-30% performance improvement comparing to the native Hadoop streaming implementation.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Advanced Information Networking and Applications, AINA 2013
Pages137-144
Number of pages8
DOIs
StatePublished - 2013
Externally publishedYes
Event27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013 - Barcelona, Spain
Duration: 25 Mar 201328 Mar 2013

Publication series

NameProceedings - International Conference on Advanced Information Networking and Applications, AINA
ISSN (Print)1550-445X

Conference

Conference27th IEEE International Conference on Advanced Information Networking and Applications, AINA 2013
Country/TerritorySpain
CityBarcelona
Period25/03/1328/03/13

Keywords

  • Hadoop streaming
  • Map-reduce
  • Shared memory

Fingerprint

Dive into the research topics of 'ShmStreaming: A shared memory approach for improving Hadoop streaming performance'. Together they form a unique fingerprint.

Cite this