TY - GEN
T1 - Negotiated Reasoning
T2 - 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025
AU - Sheng, Junjie
AU - Li, Wenhao
AU - Jin, Bo
AU - Zha, Hongyuan
AU - Wang, Jun
AU - Wang, Xiangfeng
N1 - Publisher Copyright:
© 2025 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org).
PY - 2025
Y1 - 2025
N2 - We focus on the relative over-generalization (RO) issue in fully cooperative multi-agent reinforcement learning (MARL). Existing methods show that endowing agents with reasoning can help mitigate RO empirically, but there is little theoretical insight. We first prove that RO is avoided when agents satisfy a consistent reasoning requirement. We then propose a new negotiated reasoning framework connecting reasoning and RO with theoretical guarantees. Based on it, we develop an algorithm called Stein variational negotiated reasoning (SVNR), which uses Stein variational gradient descent to form a negotiation policy that provably bypasses RO under maximum-entropy policy iteration. SVNR is further parameterized with neural networks for computational efficiency. Experiments demonstrate that SVNR significantly outperforms baselines on RO-challenged tasks, confirming its advantage in achieving better cooperation.
AB - We focus on the relative over-generalization (RO) issue in fully cooperative multi-agent reinforcement learning (MARL). Existing methods show that endowing agents with reasoning can help mitigate RO empirically, but there is little theoretical insight. We first prove that RO is avoided when agents satisfy a consistent reasoning requirement. We then propose a new negotiated reasoning framework connecting reasoning and RO with theoretical guarantees. Based on it, we develop an algorithm called Stein variational negotiated reasoning (SVNR), which uses Stein variational gradient descent to form a negotiation policy that provably bypasses RO under maximum-entropy policy iteration. SVNR is further parameterized with neural networks for computational efficiency. Experiments demonstrate that SVNR significantly outperforms baselines on RO-challenged tasks, confirming its advantage in achieving better cooperation.
KW - Multi-Agent Reinforcement Learning
KW - Relative Overgeneralization
UR - https://www.scopus.com/pages/publications/105009777170
M3 - 会议稿件
AN - SCOPUS:105009777170
T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
SP - 2741
EP - 2743
BT - Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025
A2 - Vorobeychik, Yevgeniy
A2 - Das, Sanmay
A2 - Nowe, Ann
PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Y2 - 19 May 2025 through 23 May 2025
ER -