bcac666bd7dc4f6e.tex
1: \begin{abstract}
2: Training an agent to solve control tasks directly from high-dimensional images with model-free reinforcement learning (RL) has proven difficult. The agent needs to learn  a latent representation together with a control policy to perform the task. Fitting a high-capacity encoder using a scarce reward signal is not only sample inefficient, but also prone to suboptimal convergence. 
3: Two ways to improve sample efficiency are to extract relevant features for the task and use off-policy algorithms.
4: We dissect various approaches of learning good latent features, and conclude that the image reconstruction loss is the essential ingredient that enables efficient and stable representation learning in image-based RL.
5: Following these findings, we devise an off-policy actor-critic algorithm with an auxiliary decoder that trains end-to-end and matches state-of-the-art performance across both model-free and model-based algorithms on many challenging control tasks. We release our code to encourage future research on image-based RL\footnote{Code, results, and videos are available at https://sites.google.com/view/sac-ae/home}.
6: \end{abstract}
7: