Skip to main navigation Skip to search Skip to main content

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

  • Wenhao Li
  • , Bo Jin*
  • , Xiangfeng Wang*
  • , Junchi Yan
  • , Hongyuan Zha
  • *Corresponding author for this work
  • The Chinese University of Hong Kong, Shenzhen
  • Shenzhen Institute of Artificial Intelligence and Robotics for Society
  • Tongji University
  • Shanghai Jiao Tong University

Research output: Contribution to journalArticlepeer-review

Abstract

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications due to non-interactivity between agents, the curse of dimensionality, and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. This paper proposes a flexible fully decentralized actor-critic MARL framework, which can combine most of the actor-critic methods and handle large-scale general cooperative multi-agent settings. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, the proposed framework can achieve scalability and stability for the large-scale environment. This framework also reduces information transmission by the parameter sharing mechanism and novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multiagent Particle Environment and StarCraft II show that the proposed decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.

Original languageEnglish
Article number178
JournalJournal of Machine Learning Research
Volume24
StatePublished - 2023

Keywords

  • actor-critic
  • cooperative MARL
  • decentralized
  • primal-dual method

Fingerprint

Dive into the research topics of 'F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this