A modern Bayesian look at the multi‐armed bandit | Zendy

Scott Steven L. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A modern Bayesian look at the multi‐armed bandit

Author(s) -

Scott Steven L.

Publication year - 2010

Publication title -

applied stochastic models in business and industry

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.413

H-Index - 40

eISSN - 1526-4025

pISSN - 1524-1904

DOI - 10.1002/asmb.874

Subject(s) - stochastic game , multi armed bandit , thompson sampling , computer science , heuristics , posterior probability , bayesian probability , matching (statistics) , flexibility (engineering) , probability distribution , mathematical optimization , randomized experiment , heuristic , reinforcement learning , artificial intelligence , machine learning , mathematics , mathematical economics , statistics , regret

A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching , which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore