z-logo
open-access-imgOpen Access
Bernoulli multi-armed bandit problem under delayed feedback
Author(s) -
Andrii Dzhoha
Publication year - 2021
Publication title -
vìsnik. serìâ fìziko-matematičnì nauki/vìsnik kiì̈vsʹkogo nacìonalʹnogo unìversitetu ìmenì tarasa ševčenka. serìâ fìziko-matematičnì nauki
Language(s) - English
Resource type - Journals
eISSN - 2218-2055
pISSN - 1812-5409
DOI - 10.17721/1812-5409.2021/1.2
Subject(s) - bernoulli's principle , computer science , online learning , software , mathematical optimization , learning environment , bernoulli distribution , bernoulli trial , artificial intelligence , mathematics , random variable , engineering , multimedia , statistics , programming language , aerospace engineering , mathematics education
Online learning under delayed feedback has been recently gaining increasing attention. Learning with delays is more natural in most practical applications since the feedback from the environment is not immediate. For example, the response to a drug in clinical trials could take a while. In this paper, we study the multi-armed bandit problem with Bernoulli distribution in the environment with delays by evaluating the Explore-First algorithm. We obtain the upper bounds of the algorithm, the theoretical results are applied to develop the software framework for conducting numerical experiments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here