abstract:1bc46d46fc8c98ef.tex

1: \begin{abstract}

2: This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning.

3: Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored.

4: First, the exploration strategy is either impractical or ignored in the existing analysis.

5: Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training  the Q-network. However,

6: the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient.

7: This paper provides the first theoretical convergence and sample complexity analysis of the

8:   practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.

9: \end{abstract}

10: