z-logo
open-access-imgOpen Access
The value of information in multi-armed bandits with exponentially distributed rewards
Author(s) -
Ilya O. Ryzhov,
Warren B. Powell
Publication year - 2011
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2011.04.147
Subject(s) - computer science , exponential distribution , exponential growth , exponential function , value of information , value (mathematics) , prior probability , mathematical optimization , bayesian probability , class (philosophy) , artificial intelligence , machine learning , mathematics , statistics , mathematical analysis
We consider a class of multi-armed bandit problems where the reward obtained by pulling an arm is drawn from an exponential distribution whose parameter is unknown. A Bayesian model with independent gamma priors is used to represent our beliefs and uncertainty about the exponential parameters. We derive a precise expression for the marginal value of information in this problem, which allows us to create a new knowledge gradient (KG) policy for making decisions. The policy is practical and easy to implement, making a case for value of information as a general approach to optimal learning problems with many different types of learning models

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom