1: \begin{abstract}
2: Several algorithms have been proposed to sample non-uniformly the replay buffer of deep Reinforcement Learning (RL) agents to speed-up learning, but very few theoretical foundations of these sampling schemes have been provided.
3: Among others, Prioritized Experience Replay appears as a hyperparameter sensitive heuristic, even though it can provide good performance.
4: In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient.
5: This allows deriving the theoretically optimal sampling distribution, yielding the best theoretical convergence speed. %
6: Elaborating on the knowledge of the ideal sampling scheme, we exhibit new theoretical foundations of Prioritized Experience Replay.
7: The optimal sampling distribution being intractable, we make several approximations providing good results in practice and introduce, among others, LaBER (Large Batch Experience Replay), an easy-to-code and efficient method for sampling the replay buffer.
8: LaBER, which can be combined with Deep Q-Networks, distributional RL agents or actor-critic methods, yields improved performance over a diverse range of Atari games and PyBullet environments, compared to the base agent it is implemented on and to other prioritization schemes.
9: \end{abstract}
10: