StreamingBandit: Experimenting with Bandit Policies
Author(s) -
Jules Kruijswijk,
Robin van Emden,
Petri Parvinen,
Maurits Kaptein
Publication year - 2020
Publication title -
journal of statistical software
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 7.636
H-Index - 145
ISSN - 1548-7660
DOI - 10.18637/jss.v094.i09
Subject(s) - computer science , python (programming language) , multi armed bandit , field (mathematics) , multitude , context (archaeology) , data science , management science , operations research , machine learning , artificial intelligence , regret , programming language , mathematics , paleontology , philosophy , epistemology , pure mathematics , biology , economics
A large number of statistical decision problems in the social sciences and beyond can be framed as a (contextual) multi-armed bandit problem. However, it is notoriously hard to develop and evaluate policies that tackle these types of problem, and to use such policies in applied studies. To address this issue, this paper introduces StreamingBandit, a Python web application for developing and testing bandit policies in field studies. StreamingBandit can sequentially select treatments using (online) policies in real time. Once StreamingBandit is implemented in an applied context, different policies can be tested, altered, nested, and compared. StreamingBandit makes it easy to apply a multitude of bandit policies for sequential allocation in field experiments, and allows for the quick development and re-use of novel policies. In this article, we detail the implementation logic of StreamingBandit and provide several examples of its use.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom