9346029c598a6579.tex
1: \begin{abstract}
2:   We consider finite Markov decision processes (MDPs) with convex constraints and known dynamics.
3:   In principle, this problem is amenable to off-the-shelf convex optimization solvers, but typically
4:   this approach suffers from poor scalability.
5:   In this work, we develop a first-order  algorithm, based on the Douglas-Rachford splitting,
6:   that allows us to decompose the dynamics and constraints.
7:   Thanks to this decoupling, we can incorporate a wide variety of convex constraints.
8:   Our scheme consists of simple and easy-to-implement updates that
9:   alternate between solving a regularized MDP and a projection.
10:   The inherent presence of regularized updates ensures last-iterate convergence, numerical stability,
11:   and, contrary to existing approaches, does not require us to regularize the problem explicitly.
12:   If the constraints are not attainable, we exploit salient properties of the Douglas-Rachord algorithm
13:   to detect infeasibility and compute a policy that minimally violates the constraints.
14:   We demonstrate the performance of our algorithm on two benchmark problems and show
15:   that it compares favorably to competing approaches.
16: \end{abstract}
17: