TY - GEN
T1 - Accelerating Synchronous Distributed Data Parallel Training with Small Batch Sizes
AU - Sun, Yushu
AU - Bi, Nifei
AU - Xu, Chen
AU - Niu, Yuean
AU - Zhou, Hongfu
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024
Y1 - 2024
N2 - Synchronous distributed data parallel (SDDP) training is widely employed in distributed deep learning systems to train DNN models on large datasets. The performance of SDDP training essentially depends on the communication overhead and the statistical efficiency. However, existing approaches only optimize either the communication overhead or the statistical efficiency to accelerate SDDP training. In this paper, we adopt the advantages of those approaches and design a new approach, namely SkipSMA, that benefits from both low communication overhead and high statistical efficiency. In particular, we exploit the skipping strategy with an adaptive interval to decrease the communication frequency, which guarantees low communication overhead. Moreover, we employ the correction technique to mitigate the divergence while keeping small batch sizes, which ensures high statistical efficiency. To demonstrate the performance of SkipSMA, we integrate it into TensorFlow. Our experiments show that SkipSMA outperforms the state-of-the-art solutions for SDDP training, e.g., 6.88x speedup over SSGD.
AB - Synchronous distributed data parallel (SDDP) training is widely employed in distributed deep learning systems to train DNN models on large datasets. The performance of SDDP training essentially depends on the communication overhead and the statistical efficiency. However, existing approaches only optimize either the communication overhead or the statistical efficiency to accelerate SDDP training. In this paper, we adopt the advantages of those approaches and design a new approach, namely SkipSMA, that benefits from both low communication overhead and high statistical efficiency. In particular, we exploit the skipping strategy with an adaptive interval to decrease the communication frequency, which guarantees low communication overhead. Moreover, we employ the correction technique to mitigate the divergence while keeping small batch sizes, which ensures high statistical efficiency. To demonstrate the performance of SkipSMA, we integrate it into TensorFlow. Our experiments show that SkipSMA outperforms the state-of-the-art solutions for SDDP training, e.g., 6.88x speedup over SSGD.
KW - Data Parallel
KW - Deep Learning Systems
KW - Distributed Training
KW - Synchronous Training
UR - https://www.scopus.com/pages/publications/85213354542
U2 - 10.1007/978-981-97-5569-1_33
DO - 10.1007/978-981-97-5569-1_33
M3 - 会议稿件
AN - SCOPUS:85213354542
SN - 9789819755684
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 503
EP - 513
BT - Database Systems for Advanced Applications - 29th International Conference, DASFAA 2024, Proceedings
A2 - Onizuka, Makoto
A2 - Xiao, Chuan
A2 - Lee, Jae-Gil
A2 - Tong, Yongxin
A2 - Ishikawa, Yoshiharu
A2 - Lu, Kejing
A2 - Amer-Yahia, Sihem
A2 - Jagadish, H.V.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on Database Systems for Advanced Applications, DASFAA 2024
Y2 - 2 July 2024 through 5 July 2024
ER -