
Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits
Author(s) -
Guojun Xiong,
Jian Li,
Rahul Singh
Publication year - 2022
Publication title -
proceedings of the ... aaai conference on artificial intelligence
Language(s) - Uncategorized
Resource type - Journals
eISSN - 2374-3468
pISSN - 2159-5399
DOI - 10.1609/aaai.v36i8.20852
Subject(s) - regret , reinforcement learning , markov decision process , asymptotically optimal algorithm , mathematical optimization , multi armed bandit , computer science , index (typography) , state (computer science) , time horizon , mathematics , markov process , artificial intelligence , algorithm , machine learning , statistics , world wide web