Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes | Zendy

Xavier Venel | Zendy; Bruno Ziliotto | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes

Author(s) -

Xavier Venel,

Bruno Ziliotto

Publication year - 2016

Publication title -

siam journal on control and optimization

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.486

H-Index - 116

eISSN - 1095-7138

pISSN - 0363-0129

DOI - 10.1137/15m1043340

Subject(s) - markov decision process , observable , stochastic game , limit (mathematics) , partially observable markov decision process , mathematical economics , dynamic programming , mathematical optimization , sigma , value (mathematics) , bellman equation , mathematics , markov process , decision maker , markov chain , decision problem , expected value , computer science , statistics , operations research , algorithm , physics , mathematical analysis , quantum mechanics

International audienceIn several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the strong uniform value. This solves two open problems. First, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem where the payoff is the expectation of the inferior limit of the time average payoff

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research