跳到主要导航 跳到搜索 跳到主要内容

DEALING WITH NON-STATIONARITY IN MARL VIA TRUST-REGION DECOMPOSITION

  • Wenhao Li
  • , Xiangfeng Wang*
  • , Bo Jin*
  • , Junjie Sheng
  • , Hongyuan Zha
  • *此作品的通讯作者
  • East China Normal University
  • The Chinese University of Hong Kong, Shenzhen

科研成果: 会议稿件论文同行评审

摘要

Non-stationarity is one thorny issue in cooperative multi-agent reinforcement learning (MARL). One of the reasons is the policy changes of agents during the learning process. Some existing works have discussed various consequences caused by non-stationarity with several kinds of measurement indicators. This makes the objectives or goals of existing algorithms are inevitably inconsistent and disparate. In this paper, we introduce a novel notion, the δ-stationarity measurement, to explicitly measure the non-stationarity of a policy sequence, which can be further proved to be bounded by the KL-divergence of consecutive joint policies. A straightforward but highly non-trivial way is to control the joint policies' divergence, which is difficult to estimate accurately by imposing the trust-region constraint on the joint policy. Although it has lower computational complexity to decompose the joint policy and impose trust-region constraints on the factorized policies, simple policy factorization like mean-field approximation will lead to more considerable policy divergence, which can be considered as the trust-region decomposition dilemma. We model the joint policy as a pairwise Markov random field and propose a trust-region decomposition network (TRD-Net) based on message passing to estimate the joint policy divergence more accurately. The Multi-Agent Mirror descent policy algorithm with Trust region decomposition, called MAMT, is established by adjusting the trust-region of the local policies adaptively in an end-to-end manner. MAMT can approximately constrain the consecutive joint policies' divergence to satisfy δ-stationarity and alleviate the non-stationarity problem. Our method can bring noticeable and stable performance improvement compared with baselines in cooperative tasks of different complexity.

源语言英语
出版状态已出版 - 2022
活动10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online
期限: 25 4月 202229 4月 2022

会议

会议10th International Conference on Learning Representations, ICLR 2022
Virtual, Online
时期25/04/2229/04/22

指纹

探究 'DEALING WITH NON-STATIONARITY IN MARL VIA TRUST-REGION DECOMPOSITION' 的科研主题。它们共同构成独一无二的指纹。

引用此