Premium
A Bayesian two‐armed bandit model
Author(s) -
Wang Xikui,
Liang You,
Porth Lysa
Publication year - 2018
Publication title -
applied stochastic models in business and industry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.413
H-Index - 40
eISSN - 1526-4025
pISSN - 1524-1904
DOI - 10.1002/asmb.2355
Subject(s) - stochastic game , monotonic function , mathematical optimization , multi armed bandit , dynamic programming , computer science , bayesian probability , outcome (game theory) , value (mathematics) , term (time) , sequence (biology) , stochastic programming , mathematical economics , economics , mathematics , artificial intelligence , machine learning , regret , mathematical analysis , physics , quantum mechanics , biology , genetics
A two‐armed bandit model using a Bayesian approach is formulated and investigated in this paper with the goal of maximizing the value of a certain criterion of optimality. The bandit model illustrates the trade‐off between exploration and exploitation, where exploration means acquiring scientific acknowledge for better‐informed decisions at later stages (ie, maximizing long‐term benefit), and exploitation means applying the current knowledge for the best possible outcome at the current stage (ie, maximizing the immediate expected payoff). When one arm has known characteristics, stochastic dynamic programming is applied to characterize the optimal strategy and provide the foundation for its calculation. The results show that the celebrated Gittins index can be approximated by a monotonic sequence of break‐even values. When both arms are unknown, we derive a special case of optimality of the myopic strategy.