TY - JOUR
T1 - Research on three-step accelerated gradient algorithm in deep learning
AU - Lian, Yongqiang
AU - Tang, Yincai
AU - Zhou, Shirong
N1 - Publisher Copyright:
© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
PY - 2022
Y1 - 2022
N2 - Gradient descent (GD) algorithm is the widely used optimisation method in training machine learning and deep learning models. In this paper, based on GD, Polyak's momentum (PM), and Nesterov accelerated gradient (NAG), we give the convergence of the algorithms from an initial value to the optimal value of an objective function in simple quadratic form. Based on the convergence property of the quadratic function, two sister sequences of NAG's iteration and parallel tangent methods in neural networks, the three-step accelerated gradient (TAG) algorithm is proposed, which has three sequences other than two sister sequences. To illustrate the performance of this algorithm, we compare the proposed algorithm with the three other algorithms in quadratic function, high-dimensional quadratic functions, and nonquadratic function. Then we consider to combine the TAG algorithm to the backpropagation algorithm and the stochastic gradient descent algorithm in deep learning. For conveniently facilitate the proposed algorithms, we rewite the R package ‘neuralnet’ and extend it to ‘supneuralnet’. All kinds of deep learning algorithms in this paper are included in ‘supneuralnet’ package. Finally, we show our algorithms are superior to other algorithms in four case studies.
AB - Gradient descent (GD) algorithm is the widely used optimisation method in training machine learning and deep learning models. In this paper, based on GD, Polyak's momentum (PM), and Nesterov accelerated gradient (NAG), we give the convergence of the algorithms from an initial value to the optimal value of an objective function in simple quadratic form. Based on the convergence property of the quadratic function, two sister sequences of NAG's iteration and parallel tangent methods in neural networks, the three-step accelerated gradient (TAG) algorithm is proposed, which has three sequences other than two sister sequences. To illustrate the performance of this algorithm, we compare the proposed algorithm with the three other algorithms in quadratic function, high-dimensional quadratic functions, and nonquadratic function. Then we consider to combine the TAG algorithm to the backpropagation algorithm and the stochastic gradient descent algorithm in deep learning. For conveniently facilitate the proposed algorithms, we rewite the R package ‘neuralnet’ and extend it to ‘supneuralnet’. All kinds of deep learning algorithms in this paper are included in ‘supneuralnet’ package. Finally, we show our algorithms are superior to other algorithms in four case studies.
KW - Accelerated algorithm
KW - backpropagation
KW - deep learning
KW - learning rate
KW - momentum
KW - stochastic gradient descent
UR - https://www.scopus.com/pages/publications/85096526694
U2 - 10.1080/24754269.2020.1846414
DO - 10.1080/24754269.2020.1846414
M3 - 文章
AN - SCOPUS:85096526694
SN - 2475-4269
VL - 6
SP - 40
EP - 57
JO - Statistical Theory and Related Fields
JF - Statistical Theory and Related Fields
IS - 1
ER -