跳到主要导航 跳到搜索 跳到主要内容

Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems

  • East China Normal University

科研成果: 期刊稿件文章同行评审

摘要

The machine learning/statistics literature has so far considered largely multi-armed bandit (MAB) problems in which the rewards from every arm are assumed independent and identically distributed. For more general MAB models in which every arm evolves according to a rewarded Markov process, it is well known the optimal policy is to pull an arm with the highest Gittins index. When the underlying distributions are unknown, an empirical Gittins index rule with ε-exploration (abbreviated as empirical ε-Gittinx index rule) is proposed to solve such MAB problems. This procedure is constructed by combining the idea of ε-exploration (for exploration) and empirical Gittins indices (for exploitation) computed by applying the Largest-Remaining-Index algorithm to the estimated underlying distribution. The convergence of empirical Gittins indices to the true Gittins indices and expected discounted total rewards of the empirical ε-Gittinx index rule to those of the oracle Gittins index rule is provided. A numerical simulation study is demonstrated to show the behavior of the proposed policies, and its performance over the ε-mean reward is discussed.

源语言英语
文章编号107610
期刊Computational Statistics and Data Analysis
180
DOI
出版状态已出版 - 4月 2023

指纹

探究 'Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems' 的科研主题。它们共同构成独一无二的指纹。

引用此