The policy iteration algorithm for average reward Markov decision processes with general state space
- 1 December 1997
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Automatic Control
- Vol. 42 (12), 1663-1680
- https://doi.org/10.1109/9.650016
Abstract
The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy iteration algorithm generates a sequence of policies which are c-regular, where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy and an irreducibility condition on the state space. Furthermore, under these conditions the sequence of relative value functions generated by the algorithm is bounded from below and "nearly" decreasing, from which it follows that the algorithm is always convergent. Under further conditions, it is shown that the algorithm does compute a solution to the optimality equations and hence an optimal average cost policy. These results provide elementary criteria for the existence of optimal policies for Markov decision processes with unbounded cost and recover known results for the standard linear-quadratic-Gaussian problem. In particular, in the control of multiclass queueing networks, it is found that there is a close connection between optimization of the network and optimal control of a far simpler fluid network model.Keywords
This publication has 30 references indexed in Scilit:
- Duality and linear programs for stability and performance analysis of queuing networks and scheduling policiesIEEE Transactions on Automatic Control, 1996
- Stability and convergence of moments for multiclass queueing networks via fluid limit modelsIEEE Transactions on Automatic Control, 1995
- Transience of Multiclass Queueing Networks Via Fluid Limit ModelsThe Annals of Applied Probability, 1995
- Markov Decision ProcessesWiley Series in Probability and Statistics, 1994
- Stability of Generalized Jackson NetworksThe Annals of Applied Probability, 1994
- Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processesAdvances in Applied Probability, 1993
- Markov Chains and Stochastic StabilityPublished by Springer Science and Business Media LLC ,1993
- Generalized resolvents and Harris recurrence of Markov processesContemporary Mathematics, 1993
- On the Poisson equation in the potential theory of a single kernel.MATHEMATICA SCANDINAVICA, 1991
- General Irreducible Markov Chains and Non-Negative OperatorsPublished by Cambridge University Press (CUP) ,1984