Convergence of Policy Gradient Methods for Nash Equilibria in General-sum Stochastic Games

  • Yan Chen
  • , Tao Li*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study Nash equilibria learning of a general-sum stochastic game with an unknown transition probability density function. Agents take actions at the current environment state and their joint action influences the transition of the environment state and their immediate rewards. Each agent only observes the environment state and its own immediate reward and is unknown about the actions or immediate rewards of others. We introduce the concept of weighted asymptotic Nash equilibrium with probability 1 and design a two-loop algorithm by the equivalence of Nash equilibrium and variational inequality problems. In the outer loop, we sequentially update a constructed strongly monotone variational inequality by updating a proximal parameter while employing a single-call extra-gradient algorithm in the inner loop for solving the constructed variational inequality. We show that if the associated Minty variational inequality has a solution, then the designed algorithm converges to the k1 2 -weighted asymptotic Nash equilibrium.

Original languageEnglish
Title of host publicationIFAC-PapersOnLine
EditorsHideaki Ishii, Yoshio Ebihara, Jun-ichi Imura, Masaki Yamakita
PublisherElsevier B.V.
Pages3435-3440
Number of pages6
Edition2
ISBN (Electronic)9781713872344
DOIs
StatePublished - 1 Jul 2023
Event22nd IFAC World Congress - Yokohama, Japan
Duration: 9 Jul 202314 Jul 2023

Publication series

NameIFAC-PapersOnLine
Number2
Volume56
ISSN (Electronic)2405-8963

Conference

Conference22nd IFAC World Congress
Country/TerritoryJapan
CityYokohama
Period9/07/2314/07/23

Keywords

  • Nash equilibrium
  • Stochastic games
  • policy gradient methods

Fingerprint

Dive into the research topics of 'Convergence of Policy Gradient Methods for Nash Equilibria in General-sum Stochastic Games'. Together they form a unique fingerprint.

Cite this