Stage delay scheduling: Speeding up DAG-style data analytics jobs with resource interleaving

  • Wujie Shao
  • , Fei Xu*
  • , Li Chen
  • , Haoyue Zheng
  • , Fangming Liu
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

To increase the resource utilization of datacenters, big data analytics jobs are commonly running stages in parallel which are organized into and scheduled according to the Directed Acyclic Graph (DAG). Through an in-depth analysis of the latest Alibaba cluster trace and our motivation experiments on Amazon EC2, however, we show that the CPU and network resources are still under-utilized due to the unwise stage scheduling, thereby prolonging the completion time of a DAG-style job (e.g., Spark). While existing works on reducing the job completion time focus on either task scheduling or job scheduling, stage scheduling has received comparably little attention. In this paper, we design and implement DelayStage, a simple yet effective stage delay scheduling strategy to interleave the cluster resources across the parallel stages, so as to increase the cluster resource utilization and speed up the job performance. With the aim of minimizing the makespan of parallel stages, DelayStage judiciously arranges the execution of stages in a pipelined manner to maximize the performance benefits of resource interleaving. Extensive prototype experiments on 30 Amazon EC2 instances and complementary trace-driven simulations show that DelayStage can improve the cluster resource utilization by up to 81.8% and reduce the job completion time by up to 41.3%, in comparison to the stock Spark and the state-of-the-art stage scheduling strategies, yet with acceptable runtime overhead.

Original languageEnglish
Title of host publicationProceedings of the 48th International Conference on Parallel Processing, ICPP 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450362955
DOIs
StatePublished - 5 Aug 2019
Event48th International Conference on Parallel Processing, ICPP 2019 - Kyoto, Japan
Duration: 5 Aug 20198 Aug 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference48th International Conference on Parallel Processing, ICPP 2019
Country/TerritoryJapan
CityKyoto
Period5/08/198/08/19

Keywords

  • Big data analytics
  • Job completion time
  • Parallel stages
  • Resource interleaving
  • Stage delay scheduling

Fingerprint

Dive into the research topics of 'Stage delay scheduling: Speeding up DAG-style data analytics jobs with resource interleaving'. Together they form a unique fingerprint.

Cite this