z-logo
open-access-imgOpen Access
Dependency of regret on accuracy of variance estimation for different versions of UCB strategy for Gaussian multi-armed bandits
Author(s) -
Sergey Garbar
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2052/1/012013
Subject(s) - regret , variance (accounting) , gaussian , set (abstract data type) , dependency (uml) , estimation , statistics , mathematics , monte carlo method , econometrics , computer science , artificial intelligence , economics , physics , accounting , quantum mechanics , programming language , management
We consider two variations of upper confidence bound strategy for Gaussian two-armed bandits. Rewards for the arms are assumed to have unknown expected values and unknown variances. It is demonstrated that expected regret values for both discussed strategies are continuous functions of reward variance. A set of Monte-Carlo simulations was performed to show the nature of the relation between variance estimation and losses. It is shown that the regret grows only slightly when the estimation error is fairly large, which allows to estimate the variance during the initial steps of the control and stop this estimation later.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here