Two-armed bandit problem and batch version of the mirror descent algorithm | Zendy

Александр Валерианович Колногоров | Zendy; А. В. Колногоров | Zendy; Александр Викторович Назин | Zendy; Alexander Nazin | Zendy; Дмитрий Николаевич Шиян | Zendy; Dmitry Shiyan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Two-armed bandit problem and batch version of the mirror descent algorithm

Author(s) -

Александр Валерианович Колногоров,

А. В. Колногоров,

Александр Викторович Назин,

Alexander Nazin,

Дмитрий Николаевич Шиян,

Dmitry Shiyan

Publication year - 2021

Publication title -

matematičeskaâ teoriâ igr i eë priloženiâ

Language(s) - English

Resource type - Journals

ISSN - 2074-9872

DOI - 10.17076/mgta_2021_2_34

Subject(s) - minimax , a priori and a posteriori , algorithm , computer science , gaussian , upper and lower bounds , mathematical optimization , mathematics , philosophy , physics , epistemology , quantum mechanics , mathematical analysis

We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One should determine the most efficient method and provide its predominant application. To this end, we use the mirror descent algorithm (MDA). It is well-known that corresponding minimax risk has the order of $N^{1/2$ with $N$ being the number of processed data and this bound is unimprovable in order. We propose a batch version of the MDA which allows processing data by packets that is especially important if parallel data processing can be provided. In this case, the processing time is determined by the number of batches rather than by the total number of data. Unexpectedly, it turned out that the batch version behaves unlike the ordinary one even if the number of packets is large. Moreover, the batch version provides significantly smaller value of the minimax risk, i.e., it considerably improves a control performance. We explain this result by considering another batch modification of the MDA which behavior is close to behavior of the ordinary version and minimax risk is close as well. Our estimates use invariant descriptions of the algorithms based on Gaussian approximations of incomes in batches of data in the domain of ``close'' distributions and are obtained by Monte-Carlo simulations.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore