Learning Optimal “Pigovian Tax” in Sequential Social Dilemmas

  • Yun Hua
  • , Bo Jin
  • , Shang Gao
  • , Xiangfeng Wang*
  • , Wenhao Li
  • , Hongyuan Zha
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

In multi-agent reinforcement learning (MARL), each agent acts to maximize its individual accumulated rewards. Nevertheless, individual accumulated rewards could not fully reflect how others perceive them, resulting in selfish behaviors that undermine global performance, which brings the social dilemmas. This paper adapt the famous externality theory in economic area to analyze social dilemmas in MARL, and propose the method called Learning Optimal Pigovian Tax (LOPT) to internalize the externalities in MARL. Furthermore, a reward shaping mechanism based on the approximated optimal “Pigovian Tax” is applied to reduce the social cost of each agent and tries to alleviate the social dilemmas. Compared with existing state-of-the-art methods, the proposed LOPT leads to higher collective social welfare in both the Escape Room and the Cleanup environments, which shows the superiority of our method in solving social dilemmas.

Original languageEnglish
Pages (from-to)2784-2786
Number of pages3
JournalProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2023-May
StatePublished - 2023
Event22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023 - London, United Kingdom
Duration: 29 May 20232 Jun 2023

Keywords

  • Externality
  • Multi-Agent Reinforcement Learning
  • Reward Shaping
  • Sequential Social Dilemmas

Fingerprint

Dive into the research topics of 'Learning Optimal “Pigovian Tax” in Sequential Social Dilemmas'. Together they form a unique fingerprint.

Cite this