Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space

  • Wenhao Li
  • , Xiangfeng Wang*
  • , Bo Jin*
  • , Dijun Luo
  • , Hongyuan Zha
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existing sub-actions might be temporarily invalid due to the external environment, while unseen sub-actions can be added to the current system. To solve the robustness and transferability problems in time-varying composite action spaces, we propose a structured cooperative reinforcement learning algorithm based on the centralized critic and decentralized actor framework, called SCORE. We model the single-agent problem with composite action space as a fully cooperative partially observable stochastic game and further employ a graph attention network to capture the dependencies between heterogeneous sub-actions. To promote tighter cooperation between the decomposed heterogeneous agents, SCORE introduces a hierarchical variational autoencoder, which maps the heterogeneous sub-action space into a common latent action space. We also incorporate an implicit credit assignment structure into the SCORE to overcome the multi-agent credit assignment problem in the fully cooperative partially observable stochastic game. Performance experiments on the proof-of-concept task and precision agriculture task show that SCORE has significant advantages in robustness and transferability for time-varying composite action space.

Original languageEnglish
Pages (from-to)8618-8634
Number of pages17
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume44
Issue number11
DOIs
StatePublished - 1 Nov 2022

Keywords

  • Cooperative multi-agent reinforcement learning
  • composite action space
  • time-varying action space

Fingerprint

Dive into the research topics of 'Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space'. Together they form a unique fingerprint.

Cite this