abstract:c7b8bcb1685ab026.tex

1: \begin{abstract}

2:     Much of the recent successes in Deep Reinforcement Learning have been based on minimizing the squared Bellman error.

3:     However, training is often unstable due to fast-changing target $Q$-values, and target networks are employed to regularize the $Q$-value estimation and stabilize

4:     training by using an additional set of lagging parameters.

5:     Despite their advantages, target networks are potentially an inflexible way to regularize $Q$-values which may ultimately slow down training.

6:     In this work, we address this issue by augmenting the squared Bellman error with a functional regularizer. Unlike target networks, the regularization we propose here is explicit and enables us to use up-to-date parameters as well as control the regularization. This leads to a faster yet more stable training method.

7:     We analyze the convergence of our method theoretically and empirically validate our predictions on simple environments as well as on a suite of Atari environments. We demonstrate empirical improvements over target network based methods in terms of both sample efficiency and performance. In summary, our approach provides a fast and stable alternative to replace the standard squared Bellman error.

8:

9:     % We analyze the convergence of our method theoretically in the linear Function Approximation setting and empirically validate our prediction on simple environments as well as on a suite of Atari environments.

10: \end{abstract}

11: