The value of information in multi-armed bandits with exponentially distributed rewards
Author(s) -
Ilya O. Ryzhov,
Warren B. Powell
Publication year - 2011
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2011.04.147
Subject(s) - computer science , exponential distribution , exponential growth , exponential function , value of information , value (mathematics) , prior probability , mathematical optimization , bayesian probability , class (philosophy) , artificial intelligence , machine learning , mathematics , statistics , mathematical analysis
We consider a class of multi-armed bandit problems where the reward obtained by pulling an arm is drawn from an exponential distribution whose parameter is unknown. A Bayesian model with independent gamma priors is used to represent our beliefs and uncertainty about the exponential parameters. We derive a precise expression for the marginal value of information in this problem, which allows us to create a new knowledge gradient (KG) policy for making decisions. The policy is practical and easy to implement, making a case for value of information as a general approach to optimal learning problems with many different types of learning models
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom