Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces | Zendy

Quanxin Zhu | Zendy; Xinsong Yang | Zendy; Chuangxia Huang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces

Author(s) -

Quanxin Zhu,

Xinsong Yang,

Chuangxia Huang

Publication year - 2009

Publication title -

abstract and applied analysis

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.228

H-Index - 56

eISSN - 1687-0409

pISSN - 1085-3375

DOI - 10.1155/2009/103723

Subject(s) - mathematics , markov decision process , jump , set (abstract data type) , action (physics) , markov chain , markov process , mathematical optimization , statistics , computer science , physics , quantum mechanics , programming language

We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two slightly different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research