TY - GEN
T1 - Dynamic Conservative Degree Allocation for Offline Multi-Agent Reinforcement Learning
AU - Chen, Haosheng
AU - Hua, Yun
AU - Sheng, Junjie
AU - Li, Wenhao
AU - Jin, Bo
AU - Wang, Xiangfeng
N1 - Publisher Copyright:
© 2025 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org).
PY - 2025
Y1 - 2025
N2 - Offline Multi-agent Reinforcement Learning (MARL) has been designed to learn policies from pre-collected datasets without real-time interaction in multi-agent systems. A primary concern in offline MARL is the conservative degree allocation, which involves assigning different conservatism levels to agents based on their varying influence on the system. Current approaches frequently neglect this crucial aspect, resulting in suboptimal performance, particularly when agents have differing impacts on the environment. In this paper, we propose OMCDA, a novel offline MARL algorithm that addresses the issue of conservative degree allocation by assigning dynamic conservatism levels to each agent based on their individual influence on system performance. OMCDA decomposes the Q-function into two components: one for computing the return and another for capturing deviations from the behavior policy. Additionally, OMCDA employs a dynamic allocation mechanism that adjusts conservatism levels for agents based on varying impacts, while maintaining coherent credit assignment and ensuring robust system performance throughout learning. We evaluate OMCDA on MuJoCo and SMAC, showing it outperforms existing offline MARL methods in challenging tasks by effectively addressing conservative degree allocation.
AB - Offline Multi-agent Reinforcement Learning (MARL) has been designed to learn policies from pre-collected datasets without real-time interaction in multi-agent systems. A primary concern in offline MARL is the conservative degree allocation, which involves assigning different conservatism levels to agents based on their varying influence on the system. Current approaches frequently neglect this crucial aspect, resulting in suboptimal performance, particularly when agents have differing impacts on the environment. In this paper, we propose OMCDA, a novel offline MARL algorithm that addresses the issue of conservative degree allocation by assigning dynamic conservatism levels to each agent based on their individual influence on system performance. OMCDA decomposes the Q-function into two components: one for computing the return and another for capturing deviations from the behavior policy. Additionally, OMCDA employs a dynamic allocation mechanism that adjusts conservatism levels for agents based on varying impacts, while maintaining coherent credit assignment and ensuring robust system performance throughout learning. We evaluate OMCDA on MuJoCo and SMAC, showing it outperforms existing offline MARL methods in challenging tasks by effectively addressing conservative degree allocation.
KW - Multi-agent reinforcement learning
KW - Offline reinforcement learning
UR - https://www.scopus.com/pages/publications/105009770010
M3 - 会议稿件
AN - SCOPUS:105009770010
T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
SP - 2457
EP - 2459
BT - Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025
A2 - Vorobeychik, Yevgeniy
A2 - Das, Sanmay
A2 - Nowe, Ann
PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
T2 - 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025
Y2 - 19 May 2025 through 23 May 2025
ER -