跳到主要导航 跳到搜索 跳到主要内容

Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space

  • Wenhao Li
  • , Xiangfeng Wang*
  • , Bo Jin*
  • , Dijun Luo
  • , Hongyuan Zha
  • *此作品的通讯作者
  • East China Normal University
  • Tencent
  • The Chinese University of Hong Kong, Shenzhen

科研成果: 期刊稿件文章同行评审

摘要

In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existing sub-actions might be temporarily invalid due to the external environment, while unseen sub-actions can be added to the current system. To solve the robustness and transferability problems in time-varying composite action spaces, we propose a structured cooperative reinforcement learning algorithm based on the centralized critic and decentralized actor framework, called SCORE. We model the single-agent problem with composite action space as a fully cooperative partially observable stochastic game and further employ a graph attention network to capture the dependencies between heterogeneous sub-actions. To promote tighter cooperation between the decomposed heterogeneous agents, SCORE introduces a hierarchical variational autoencoder, which maps the heterogeneous sub-action space into a common latent action space. We also incorporate an implicit credit assignment structure into the SCORE to overcome the multi-agent credit assignment problem in the fully cooperative partially observable stochastic game. Performance experiments on the proof-of-concept task and precision agriculture task show that SCORE has significant advantages in robustness and transferability for time-varying composite action space.

源语言英语
页(从-至)8618-8634
页数17
期刊IEEE Transactions on Pattern Analysis and Machine Intelligence
44
11
DOI
出版状态已出版 - 1 11月 2022

指纹

探究 'Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space' 的科研主题。它们共同构成独一无二的指纹。

引用此