abstract:3b66a08ee114dd86.tex

1: \begin{abstract}

2:     In reinforcement learning for partially observable environments, many successful algorithms were developed within the asymmetric learning paradigm.

3:     This paradigm leverages additional state information available at training time for faster learning.

4:     Although the proposed learning objectives are usually theoretically sound, these methods still lack a theoretical justification for their potential benefits.

5:     We propose such a justification for asymmetric actor-critic algorithms with linear function approximators by adapting a finite-time convergence analysis to this setting.

6:     The resulting finite-time bound reveals that the asymmetric critic eliminates an error term arising from aliasing in the agent state.

7: \end{abstract}

8: