TY - GEN
T1 - Linear Speedup of Incremental Aggregated Gradient Methods on Streaming Data
AU - Wang, Xiaolu
AU - Jin, Cheng
AU - Wai, Hoi To
AU - Gu, Yuantao
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization. The IAG method is well suited for the parameter server architecture as the latter can easily aggregate potentially staled gradients contributed by workers. Although the convergence of IAG in the case of deterministic gradient is well known, there are only a few results for the case of its stochastic variant based on streaming data. Considering strongly convex optimization, this paper shows that the streaming IAG method achieves linear speedup when the workers are updating frequently enough, even if the data sample distribution across workers are heterogeneous. We show that the expected squared distance to optimal solution decays at O((1+T)/(nt)), where n is the number of workers, t is the iteration number, and T/n is the update frequency of workers. Our analysis involves careful treatments of the conditional expectations with staled gradients and a recursive system with both delayed and noise terms, which are new to the analysis of IAG-type algorithms. Numerical results are presented to verify our findings.
AB - This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization. The IAG method is well suited for the parameter server architecture as the latter can easily aggregate potentially staled gradients contributed by workers. Although the convergence of IAG in the case of deterministic gradient is well known, there are only a few results for the case of its stochastic variant based on streaming data. Considering strongly convex optimization, this paper shows that the streaming IAG method achieves linear speedup when the workers are updating frequently enough, even if the data sample distribution across workers are heterogeneous. We show that the expected squared distance to optimal solution decays at O((1+T)/(nt)), where n is the number of workers, t is the iteration number, and T/n is the update frequency of workers. Our analysis involves careful treatments of the conditional expectations with staled gradients and a recursive system with both delayed and noise terms, which are new to the analysis of IAG-type algorithms. Numerical results are presented to verify our findings.
UR - https://www.scopus.com/pages/publications/85184802727
U2 - 10.1109/CDC49753.2023.10383924
DO - 10.1109/CDC49753.2023.10383924
M3 - 会议稿件
AN - SCOPUS:85184802727
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 4314
EP - 4319
BT - 2023 62nd IEEE Conference on Decision and Control, CDC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 62nd IEEE Conference on Decision and Control, CDC 2023
Y2 - 13 December 2023 through 15 December 2023
ER -