Abstract
In recent years, reinforcement learning has achieved excellent results in low-dimensional static action spaces such as games and simple robotics. However, the action space is usually composite, composed of multiple sub-action with different functions, and time-varying for practical tasks. The existing sub-actions might be temporarily invalid due to the external environment, while unseen sub-actions can be added to the current system. To solve the robustness and transferability problems in time-varying composite action spaces, we propose a structured cooperative reinforcement learning algorithm based on the centralized critic and decentralized actor framework, called SCORE. We model the single-agent problem with composite action space as a fully cooperative partially observable stochastic game and further employ a graph attention network to capture the dependencies between heterogeneous sub-actions. To promote tighter cooperation between the decomposed heterogeneous agents, SCORE introduces a hierarchical variational autoencoder, which maps the heterogeneous sub-action space into a common latent action space. We also incorporate an implicit credit assignment structure into the SCORE to overcome the multi-agent credit assignment problem in the fully cooperative partially observable stochastic game. Performance experiments on the proof-of-concept task and precision agriculture task show that SCORE has significant advantages in robustness and transferability for time-varying composite action space.
| Original language | English |
|---|---|
| Pages (from-to) | 8618-8634 |
| Number of pages | 17 |
| Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
| Volume | 44 |
| Issue number | 11 |
| DOIs | |
| State | Published - 1 Nov 2022 |
Keywords
- Cooperative multi-agent reinforcement learning
- composite action space
- time-varying action space
Fingerprint
Dive into the research topics of 'Structured Cooperative Reinforcement Learning With Time-Varying Composite Action Space'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver