Technical Note—Improved Conditions for Convergence in Undiscounted Markov Renewal Programming
Author(s) -
Loren K. Platzman
Publication year - 1977
Publication title -
operations research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.797
H-Index - 140
eISSN - 1526-5463
pISSN - 0030-364X
DOI - 10.1287/opre.25.3.529
Subject(s) - markov chain , convergence (economics) , markov decision process , mathematical optimization , mathematics , markov process , dynamic programming , state (computer science) , transient (computer programming) , markov renewal process , element (criminal law) , term (time) , computer science , markov model , markov property , algorithm , economics , statistics , physics , quantum mechanics , economic growth , operating system , political science , law
In a simply connected Markov renewal problem, each state is either transient under all policies or an element of a single chain under some policy. This property is easily verified; it implies invariance of the maximal long-term average return (gain) with respect to the initial state, which in turn assures convergence of Odoni's bounds in the damped value-iteration algorithm due to Schweitzer, even when the maximal-gain process is multiple-chained and/or periodic.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom