abstract:6fa280e8dca4d4eb.tex

1: \begin{abstract}

2: This note re-visits the rolling-horizon control approach

3: to the problem of

4: a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion.

5: Distinguished from the classical value-iteration approach,

6: we develop an asynchronous on-line algorithm based on policy iteration integrated with

7: a multi-policy improvement method of policy switching.

8: A sequence of monotonically improving solutions to the forecast-horizon sub-MDP

9: is generated by updating the current solution only at the currently visited state, building in effect a rolling-horizon control policy for the MDP

10: over infinite horizon.

11: Feedbacks from ``supervisors," if available, can be also

12: incorporated while updating.

13: We focus on the convergence issue with a relation to the transition

14: structure of the MDP.

15: Either a global convergence to an optimal forecast-horizon policy

16: or a local convergence to a ``locally-optimal" fixed-policy in a finite time

17: is achieved by the algorithm depending on the structure.

18: \end{abstract}