Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes
Author(s) -
Xavier Venel,
Bruno Ziliotto
Publication year - 2016
Publication title -
siam journal on control and optimization
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.486
H-Index - 116
eISSN - 1095-7138
pISSN - 0363-0129
DOI - 10.1137/15m1043340
Subject(s) - markov decision process , observable , stochastic game , limit (mathematics) , partially observable markov decision process , mathematical economics , dynamic programming , mathematical optimization , sigma , value (mathematics) , bellman equation , mathematics , markov process , decision maker , markov chain , decision problem , expected value , computer science , statistics , operations research , algorithm , physics , mathematical analysis , quantum mechanics
International audienceIn several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the strong uniform value. This solves two open problems. First, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem where the payoff is the expectation of the inferior limit of the time average payoff
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom