abstract:2e315cca92fc97e1.tex

1: \begin{abstract}

2: Stochastic Approximation (SA) is a widely used algorithmic approach

3: in various fields, including optimization and reinforcement learning

4: (RL). Among RL algorithms, Q-learning is particularly popular due

5: to its empirical success. In this paper, we study asynchronous Q-learning

6: with constant stepsize, which is commonly used in practice for its

7: fast convergence. By connecting the constant stepsize Q-learning

8: to a time-homogeneous Markov chain, we show the distributional

9: convergence of the iterates in Wasserstein distance and establish

10: its exponential convergence rate. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the asymptotic normality of the averaged iterates. Moreover, we provide an explicit

11: expansion of the asymptotic bias of the averaged iterate in stepsize.

12: Specifically, the bias is proportional to the stepsize

13: up to higher-order terms and we provide an explicit expression for the linear coefficient. This precise characterization of the bias

14: allows the application of Richardson-Romberg (RR) extrapolation technique

15: to construct a new estimate that is provably closer to the optimal

16: Q function. Numerical results corroborate our theoretical finding

17: on the improvement of the RR extrapolation method.

18: \end{abstract}

19: