abstract:d5a03c1537dd4e89.tex

1: \begin{abstract}%

2:  We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic. For such MDPs, we prove the convergence of the stochastic dynamics associated with a version of optimistic policy iteration (OPI), suggested in \cite{tsitsiklis2002convergence}, in which the values associated with all the nodes visited during each iteration of the OPI are updated. %

3: \end{abstract}

4: