A General Theory Of Multiarmed Bandit Processes With Constrained Arm Switches

Wenqing Bao, Xiaoqiang Cai, Xianyi Wu

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

This paper develops a general theory on optimal allocation of multiarmed bandit (MAB) processes subject to arm switching constraints formulated as a general random time set. A Gittins index is constructed for each single arm, and the optimality of the corresponding Gittins index policy is proved. The constrained MAB model and the Gittins index policy established in this paper subsume those for MAB processes in continuous time, integer time, semi-Markovian, as well as general discrete time settings. Consequently, the new theory covers the classical MAB models as special cases and also applies to many other situations that have not yet been studied in the literature. While the proof of the optimality of the Gittins index policy benefits from ideas in the existing theory of MAB processes in continuous time, new techniques are introduced which drastically simplify the proof.

Original languageEnglish
Pages (from-to)4666-4688
Number of pages23
JournalSIAM Journal on Control and Optimization
Volume59
Issue number6
DOIs
StatePublished - 2021

Keywords

  • Gittins index
  • machine learning/reinforcement learning
  • multiarmed bandit processes
  • restricted stopping time
  • stochastic adaptive control

Fingerprint

Dive into the research topics of 'A General Theory Of Multiarmed Bandit Processes With Constrained Arm Switches'. Together they form a unique fingerprint.

Cite this