1: \begin{abstract}
2: Stochastic Approximation (SA) is a widely used algorithmic approach
3: in various fields, including optimization and reinforcement learning
4: (RL). Among RL algorithms, Q-learning is particularly popular due
5: to its empirical success. In this paper, we study asynchronous Q-learning
6: with constant stepsize, which is commonly used in practice for its
7: fast convergence. By connecting the constant stepsize Q-learning
8: to a time-homogeneous Markov chain, we show the distributional
9: convergence of the iterates in Wasserstein distance and establish
10: its exponential convergence rate. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the asymptotic normality of the averaged iterates. Moreover, we provide an explicit
11: expansion of the asymptotic bias of the averaged iterate in stepsize.
12: Specifically, the bias is proportional to the stepsize
13: up to higher-order terms and we provide an explicit expression for the linear coefficient. This precise characterization of the bias
14: allows the application of Richardson-Romberg (RR) extrapolation technique
15: to construct a new estimate that is provably closer to the optimal
16: Q function. Numerical results corroborate our theoretical finding
17: on the improvement of the RR extrapolation method.
18: \end{abstract}
19: