3b66a08ee114dd86.tex
1: \begin{abstract}
2:     In reinforcement learning for partially observable environments, many successful algorithms were developed within the asymmetric learning paradigm.
3:     This paradigm leverages additional state information available at training time for faster learning.
4:     Although the proposed learning objectives are usually theoretically sound, these methods still lack a theoretical justification for their potential benefits.
5:     We propose such a justification for asymmetric actor-critic algorithms with linear function approximators by adapting a finite-time convergence analysis to this setting.
6:     The resulting finite-time bound reveals that the asymmetric critic eliminates an error term arising from aliasing in the agent state.
7: \end{abstract}
8: