1: \begin{abstract}
2: In this paper we present a `stability theorem' for stochastic approximation (SA)
3: algorithms with `controlled Markov' noise.
4: Such algorithms were first studied by \textbf{Borkar} in \textit{2006}.
5: %Sufficient conditions are presented for the `stability and convergence' of such algorithms.
6: Specifically, sufficient conditions are presented
7: %, that are consistent with the assumptions made by \textbf{Borkar},
8: which guarantee the stability of the iterates.
9: Further, under these conditions the iterates are shown to track a solution to the differential
10: inclusion defined in terms of the ergodic occupation measures associated with the
11: `controlled Markov' process. As an application to our main result
12: we present an improvement to a general form of temporal difference learning algorithms.
13: Specifically, we present sufficient conditions for their stability and convergence using
14: our framework.
15: This paper builds on the works of \textbf{Borkar}
16: and \textbf{Benveniste, Metivier and Priouret}.
17: \end{abstract}
18: