1: \begin{abstract}
2: Recent advances in deep reinforcement learning have achieved human-level performance on a variety of real-world applications. However, the current algorithms still suffer from poor gradient estimation with excessive variance, resulting in unstable training and poor sample efficiency. In our paper, we proposed an innovative optimization strategy by utilizing stochastic variance reduced gradient (SVRG) techniques. With extensive experiments on Atari domain, our method outperforms the deep q-learning baselines on $18$ out of $20$ games.
3: \begin{comment}
4: Stochastic optimization methods play a crucial role in the success of many machine learning problems. However, it still often suffers from data noise and large variance issue due to random subsampling as well as the challenge of learning rate scheduling, which further leads to unstable and poor convergence rate during training process. We propose a novel optimization algorithm which combines the adaptive learning ability of Adam with SVRG to reduce the variance of gradient direction estimation and further accelerate the convergence rate. We evaluate the performance of our approach on 20 games of the challenging Arcade Learning Environment and report significant improvement in both reward scores and learning speed.
5: \end{comment}
6: \end{abstract}
7: