abstract:4b81697557aae963.tex

1: \begin{abstract}

2: Several algorithms have been proposed to sample non-uniformly the replay buffer of deep Reinforcement Learning (RL) agents to speed-up learning, but very few theoretical foundations of these sampling schemes have been provided.

3: Among others, Prioritized Experience Replay appears as a hyperparameter sensitive heuristic, even though it can provide good performance.

4: In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient.

5: This allows deriving the theoretically optimal sampling distribution, yielding the best theoretical convergence speed. %

6: Elaborating on the knowledge of the ideal sampling scheme, we exhibit new theoretical foundations of Prioritized Experience Replay.

7: The optimal sampling distribution being intractable, we make several approximations providing good results in practice and introduce, among others, LaBER (Large Batch Experience Replay), an easy-to-code and efficient method for sampling the replay buffer.

8: LaBER, which can be combined with Deep Q-Networks, distributional RL agents or actor-critic methods, yields improved performance over a diverse range of Atari games and PyBullet environments, compared to the base agent it is implemented on and to other prioritization schemes.

9: \end{abstract}

10: