abstract:164e7c1b8bcab0ae.tex

1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file

2: % In this paper, we establish the first theoretical comparison between Double Q-learning and Q-learning. We consider linear function approximations assuming that the optimal policy is unique, and both algorithms are assumed to converge. To derive our results, we devise a general framework that can also analyze Double TD-learning and other possible algorithms with double estimators. Our main theoretical result is that, under the assumptions stated earlier, if Double Q-learning is implemented using random updates from a single trace, then both asymptotic convergence rate and variance are no better than Q-learning with appropriate step-sizes chosen for both algorithms.

3: % We also provide an example which shows that double Q-learning can perform significantly worse than Q-learning. So our conclusion is that double Q-learning must be used with caution in practice.

4:

5: In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-squared error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and outputs the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.

6:

7:

8: \end{abstract}

9: