88344c45d83bb1d5.tex
1: \begin{abstract}
2: We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by
3: `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in 
4: addition to martingale difference noise. We analyze the asymptotic behavior of our framework
5: by relating it
6: to limiting differential inclusions in both time-scales that are defined in terms 
7: of the ergodic occupation measures associated with the controlled 
8: Markov processes. 
9: %We also point out that some additional assumptions are needed to complete the analysis of single time-scale controlled Markov noise framework of Borkar which
10: %motivates us to take the range of the controlled Markov processes as compact. 
11: Finally, we present a solution to the off-policy convergence problem for temporal difference 
12: learning with linear function approximation, using 
13: our results. 
14: \end{abstract}
15: