TY - JOUR
T1 - Attentive multi-view reinforcement learning
AU - Hu, Yueyue
AU - Sun, Shiliang
AU - Xu, Xin
AU - Zhao, Jing
N1 - Publisher Copyright:
© 2020, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2020/11/1
Y1 - 2020/11/1
N2 - The reinforcement learning process usually takes millions of steps from scratch, due to the limited observation experience. More precisely, the representation approximated by a single deep network is usually limited for reinforcement learning agents. In this paper, we propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning framework for the first time. Based on the multi-view scheme of function approximation, the proposed model approximates multiple view-specific policy or value functions in parallel by estimating the middle-level representation and integrates these functions based on attention mechanisms to generate a comprehensive strategy. Furthermore, we develop the multi-view generalized policy improvement to jointly optimize all policies instead of a single one. Compared with the single-view function approximation scheme in reinforcement learning methods, experimental results on eight Atari benchmarks show that MvDAN outperforms the state-of-the-art methods and has faster convergence and training stability.
AB - The reinforcement learning process usually takes millions of steps from scratch, due to the limited observation experience. More precisely, the representation approximated by a single deep network is usually limited for reinforcement learning agents. In this paper, we propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning framework for the first time. Based on the multi-view scheme of function approximation, the proposed model approximates multiple view-specific policy or value functions in parallel by estimating the middle-level representation and integrates these functions based on attention mechanisms to generate a comprehensive strategy. Furthermore, we develop the multi-view generalized policy improvement to jointly optimize all policies instead of a single one. Compared with the single-view function approximation scheme in reinforcement learning methods, experimental results on eight Atari benchmarks show that MvDAN outperforms the state-of-the-art methods and has faster convergence and training stability.
KW - Deep reinforcement learning
KW - Function approximation
KW - Multi-view learning
KW - Representation learning
UR - https://www.scopus.com/pages/publications/85085090658
U2 - 10.1007/s13042-020-01130-6
DO - 10.1007/s13042-020-01130-6
M3 - 文章
AN - SCOPUS:85085090658
SN - 1868-8071
VL - 11
SP - 2461
EP - 2474
JO - International Journal of Machine Learning and Cybernetics
JF - International Journal of Machine Learning and Cybernetics
IS - 11
ER -