TY - GEN
T1 - Towards Efficient Workflow Scheduling Over Yarn Cluster Using Deep Reinforcement Learning
AU - Xue, Jianguo
AU - Wang, Ting
AU - Cai, Puyu
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Hadoop Yarn is an open-source cluster manager responsible for resource management and job scheduling. However, data-driven applications are typically organized into workflows that consist of a series of jobs with dependencies. Yarn does not manage users' workflows and only considers the current job rather than the entire workflow when scheduling. In practice, multiple workflows share the same Yarn cluster and are pre-assigned separate Yarn resource queues to avoid mutual interference. However, this coarse-grained resource division can sometimes result in low resource utilization and increased pending time of jobs on the Yarn queue. For instance, one resource queue may have exhausted its quota while still having pending jobs, while other queues may have available resources but cannot begin executing any jobs due to unfulfilled data dependencies. To address this problem, we propose a deep reinforcement learning-based workflow scheduling scheme that takes into account job dependencies, job priorities, and dynamic resource usage. The proposed approach can intelligently identify and utilize free windows of different resource queues. Our simulation results demonstrate that the proposed DRL-based workflow scheduling scheme can significantly reduce the average job latency compared to existing approaches.
AB - Hadoop Yarn is an open-source cluster manager responsible for resource management and job scheduling. However, data-driven applications are typically organized into workflows that consist of a series of jobs with dependencies. Yarn does not manage users' workflows and only considers the current job rather than the entire workflow when scheduling. In practice, multiple workflows share the same Yarn cluster and are pre-assigned separate Yarn resource queues to avoid mutual interference. However, this coarse-grained resource division can sometimes result in low resource utilization and increased pending time of jobs on the Yarn queue. For instance, one resource queue may have exhausted its quota while still having pending jobs, while other queues may have available resources but cannot begin executing any jobs due to unfulfilled data dependencies. To address this problem, we propose a deep reinforcement learning-based workflow scheduling scheme that takes into account job dependencies, job priorities, and dynamic resource usage. The proposed approach can intelligently identify and utilize free windows of different resource queues. Our simulation results demonstrate that the proposed DRL-based workflow scheduling scheme can significantly reduce the average job latency compared to existing approaches.
KW - Deep Reinforcement Learning
KW - Workflow Scheduling
KW - Yarn Cluster
UR - https://www.scopus.com/pages/publications/85187347221
U2 - 10.1109/GLOBECOM54140.2023.10436820
DO - 10.1109/GLOBECOM54140.2023.10436820
M3 - 会议稿件
AN - SCOPUS:85187347221
T3 - Proceedings - IEEE Global Communications Conference, GLOBECOM
SP - 473
EP - 478
BT - GLOBECOM 2023 - 2023 IEEE Global Communications Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE Global Communications Conference, GLOBECOM 2023
Y2 - 4 December 2023 through 8 December 2023
ER -