TY - JOUR
T1 - Learning Roles with Emergent Social Value Orientations
AU - Li, Wenhao
AU - Wang, Xiangfeng
AU - Jin, Bo
AU - Lu, Jingyi
AU - Zha, Hongyuan
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Social dilemmas can be considered situations where individual rationality leads to collective irrationality. The multi-agent reinforcement learning community has leveraged ideas from social science, such as social value orientations (SVO), to solve social dilemmas in complex cooperative tasks. In this paper, we first introduce the typical “division of labor or roles” mechanism in human society, and provide a promising solution for intertemporal social dilemmas (ISD) with SVOs. A novel learning framework, called Learning Roles with Emergent SVOs (RESVO), is proposed to transform the learning of roles into the social value orientation emergence, which is symmetrically solved by endowing agents with altruism to share rewards with other agents. An SVO-based role embedding space is then constructed by individual conditioning policies on roles with a novel rank regularizer and mutual information maximizer. Experiments show that RESVO achieves a stable division of labor and cooperation in ISDs with different complexity.
AB - Social dilemmas can be considered situations where individual rationality leads to collective irrationality. The multi-agent reinforcement learning community has leveraged ideas from social science, such as social value orientations (SVO), to solve social dilemmas in complex cooperative tasks. In this paper, we first introduce the typical “division of labor or roles” mechanism in human society, and provide a promising solution for intertemporal social dilemmas (ISD) with SVOs. A novel learning framework, called Learning Roles with Emergent SVOs (RESVO), is proposed to transform the learning of roles into the social value orientation emergence, which is symmetrically solved by endowing agents with altruism to share rewards with other agents. An SVO-based role embedding space is then constructed by individual conditioning policies on roles with a novel rank regularizer and mutual information maximizer. Experiments show that RESVO achieves a stable division of labor and cooperation in ISDs with different complexity.
KW - Division of Labor
KW - Multi-Agent Reinforcement Learning
KW - Social Dilemma
KW - Social Value Orientation
UR - https://www.scopus.com/pages/publications/105019568615
U2 - 10.1109/TPAMI.2025.3620954
DO - 10.1109/TPAMI.2025.3620954
M3 - 文章
AN - SCOPUS:105019568615
SN - 0162-8828
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
ER -