6e9daf2cab1ca901.tex
1: \begin{abstract}
2: Meta-Reinforcement Learning (\mtext{MRL}) is a promising framework for training agents that can quickly adapt to new environments and tasks. In this work, we study the \mtext{MRL} problem under the policy gradient formulation, where we propose a novel algorithm that uses Moreau envelope surrogate regularizers to jointly learn a meta-policy that is adjustable to the environment of each individual task. Our algorithm, called Moreau Envelope Meta-Reinforcement Learning (\mtext{MEMRL}), learns a meta-policy that can adapt to a distribution of tasks by efficiently updating the policy parameters using a combination of gradient-based optimization and Moreau Envelope regularization. Moreau Envelopes provide a smooth approximation of the policy optimization problem, which enables us to apply standard optimization techniques and converge to an appropriate stationary point. We provide a detailed analysis of the \mtext{MEMRL} algorithm, where we show a sublinear convergence rate to a first-order stationary point for non-convex policy gradient optimization. We finally show the effectiveness of \mtext{MEMRL} on a multi-task $2$D-navigation problem.
3: \end{abstract}
4: