abstract:88344c45d83bb1d5.tex

1: \begin{abstract}

2: We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by

3: `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in

4: addition to martingale difference noise. We analyze the asymptotic behavior of our framework

5: by relating it

6: to limiting differential inclusions in both time-scales that are defined in terms

7: of the ergodic occupation measures associated with the controlled

8: Markov processes.

9: %We also point out that some additional assumptions are needed to complete the analysis of single time-scale controlled Markov noise framework of Borkar which

10: %motivates us to take the range of the controlled Markov processes as compact.

11: Finally, we present a solution to the off-policy convergence problem for temporal difference

12: learning with linear function approximation, using

13: our results.

14: \end{abstract}

15: