abstract:6a98309c390fdf53.tex

1: \begin{abstract}

2: 	Motivated by applications in Reinforcement Learning (RL), in this paper, we study a nonlinear Stochastic Approximation (SA) algorithm under Markovian noise, and derive its finite-sample convergence bounds. Our proof is based on the Lyapunov drift arguments, and to handle the Markovian noise, we exploit the fast mixing of the underlying Markov chain.

3:

4: 	Our result is used to show the finite-sample bounds of the popular $Q$-learning with linear function approximation algorithm for solving the RL problem. Since $Q$-learning with linear function approximation may diverge in general, we study it under a condition on the behavior policy that ensures the stability of the algorithm. Due to the generality of our SA results, we do not need to make the unnatural assumption that the samples are i.i.d. (since they are Markovian), and do not require an additional projection step in the algorithm to maintain the boundedness of the iterates.

5: \end{abstract}

6: