The policy iteration algorithm for average reward Markov decision processes with general state space

1 December 1997

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Automatic Control

Vol. 42 (12), 1663-1680
https://doi.org/10.1109/9.650016

Abstract

The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy iteration algorithm generates a sequence of policies which are c-regular, where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy and an irreducibility condition on the state space. Furthermore, under these conditions the sequence of relative value functions generated by the algorithm is bounded from below and "nearly" decreasing, from which it follows that the algorithm is always convergent. Under further conditions, it is shown that the algorithm does compute a solution to the optimality equations and hence an optimal average cost policy. These results provide elementary criteria for the existence of optimal policies for Markov decision processes with unbounded cost and recover known results for the standard linear-quadratic-Gaussian problem. In particular, in the control of multiclass queueing networks, it is found that there is a close connection between optimization of the network and optimal control of a far simpler fluid network model.

Keywords

This publication has 30 references indexed in Scilit:

Duality and linear programs for stability and performance analysis of queuing networks and scheduling policies
IEEE Transactions on Automatic Control, 1996
Stability and convergence of moments for multiclass queueing networks via fluid limit models
IEEE Transactions on Automatic Control, 1995
Transience of Multiclass Queueing Networks Via Fluid Limit Models
The Annals of Applied Probability, 1995
Markov Decision Processes
Wiley Series in Probability and Statistics, 1994
Stability of Generalized Jackson Networks
The Annals of Applied Probability, 1994
Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processes
Advances in Applied Probability, 1993
Markov Chains and Stochastic Stability
Published by Springer Science and Business Media LLC ,1993
Generalized resolvents and Harris recurrence of Markov processes
Contemporary Mathematics, 1993
On the Poisson equation in the potential theory of a single kernel.
MATHEMATICA SCANDINAVICA, 1991
General Irreducible Markov Chains and Non-Negative Operators
Published by Cambridge University Press (CUP) ,1984

Cited by 88 articles