abstract:1d75369248bcfdd7.tex

1: \begin{abstract}

2: Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems.

3: However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces \ETAlgLong (\NoSpaceETAlg), which partitions an ER buffer into \NoSpaceET, each capturing important subsequences of optimal behavior. We prove a theoretical advantage over the traditional monolithic buffer approach and combine  \ETAlg with an existing prioritized sampling strategy to further improve learning speed and stability.  Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of \ETAlg over existing ER buffer sampling approaches.

4: \end{abstract}

5: