1: \begin{abstract}
2: We consider finite Markov decision processes (MDPs) with convex constraints and known dynamics.
3: In principle, this problem is amenable to off-the-shelf convex optimization solvers, but typically
4: this approach suffers from poor scalability.
5: In this work, we develop a first-order algorithm, based on the Douglas-Rachford splitting,
6: that allows us to decompose the dynamics and constraints.
7: Thanks to this decoupling, we can incorporate a wide variety of convex constraints.
8: Our scheme consists of simple and easy-to-implement updates that
9: alternate between solving a regularized MDP and a projection.
10: The inherent presence of regularized updates ensures last-iterate convergence, numerical stability,
11: and, contrary to existing approaches, does not require us to regularize the problem explicitly.
12: If the constraints are not attainable, we exploit salient properties of the Douglas-Rachord algorithm
13: to detect infeasibility and compute a policy that minimally violates the constraints.
14: We demonstrate the performance of our algorithm on two benchmark problems and show
15: that it compares favorably to competing approaches.
16: \end{abstract}
17: