TY - JOUR
T1 - Learning Optimal “Pigovian Tax” in Sequential Social Dilemmas
AU - Hua, Yun
AU - Jin, Bo
AU - Gao, Shang
AU - Wang, Xiangfeng
AU - Li, Wenhao
AU - Zha, Hongyuan
N1 - Publisher Copyright:
© 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
PY - 2023
Y1 - 2023
N2 - In multi-agent reinforcement learning (MARL), each agent acts to maximize its individual accumulated rewards. Nevertheless, individual accumulated rewards could not fully reflect how others perceive them, resulting in selfish behaviors that undermine global performance, which brings the social dilemmas. This paper adapt the famous externality theory in economic area to analyze social dilemmas in MARL, and propose the method called Learning Optimal Pigovian Tax (LOPT) to internalize the externalities in MARL. Furthermore, a reward shaping mechanism based on the approximated optimal “Pigovian Tax” is applied to reduce the social cost of each agent and tries to alleviate the social dilemmas. Compared with existing state-of-the-art methods, the proposed LOPT leads to higher collective social welfare in both the Escape Room and the Cleanup environments, which shows the superiority of our method in solving social dilemmas.
AB - In multi-agent reinforcement learning (MARL), each agent acts to maximize its individual accumulated rewards. Nevertheless, individual accumulated rewards could not fully reflect how others perceive them, resulting in selfish behaviors that undermine global performance, which brings the social dilemmas. This paper adapt the famous externality theory in economic area to analyze social dilemmas in MARL, and propose the method called Learning Optimal Pigovian Tax (LOPT) to internalize the externalities in MARL. Furthermore, a reward shaping mechanism based on the approximated optimal “Pigovian Tax” is applied to reduce the social cost of each agent and tries to alleviate the social dilemmas. Compared with existing state-of-the-art methods, the proposed LOPT leads to higher collective social welfare in both the Escape Room and the Cleanup environments, which shows the superiority of our method in solving social dilemmas.
KW - Externality
KW - Multi-Agent Reinforcement Learning
KW - Reward Shaping
KW - Sequential Social Dilemmas
UR - https://www.scopus.com/pages/publications/85171278554
M3 - 会议文章
AN - SCOPUS:85171278554
SN - 1548-8403
VL - 2023-May
SP - 2784
EP - 2786
JO - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
JF - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
T2 - 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023
Y2 - 29 May 2023 through 2 June 2023
ER -