1: \begin{abstract}
2: This note re-visits the rolling-horizon control approach
3: to the problem of
4: a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion.
5: Distinguished from the classical value-iteration approach,
6: we develop an asynchronous on-line algorithm based on policy iteration integrated with
7: a multi-policy improvement method of policy switching.
8: A sequence of monotonically improving solutions to the forecast-horizon sub-MDP
9: is generated by updating the current solution only at the currently visited state, building in effect a rolling-horizon control policy for the MDP
10: over infinite horizon.
11: Feedbacks from ``supervisors," if available, can be also
12: incorporated while updating.
13: We focus on the convergence issue with a relation to the transition
14: structure of the MDP.
15: Either a global convergence to an optimal forecast-horizon policy
16: or a local convergence to a ``locally-optimal" fixed-policy in a finite time
17: is achieved by the algorithm depending on the structure.
18: \end{abstract}