abstract:9346029c598a6579.tex

1: \begin{abstract}

2:   We consider finite Markov decision processes (MDPs) with convex constraints and known dynamics.

3:   In principle, this problem is amenable to off-the-shelf convex optimization solvers, but typically

4:   this approach suffers from poor scalability.

5:   In this work, we develop a first-order  algorithm, based on the Douglas-Rachford splitting,

6:   that allows us to decompose the dynamics and constraints.

7:   Thanks to this decoupling, we can incorporate a wide variety of convex constraints.

8:   Our scheme consists of simple and easy-to-implement updates that

9:   alternate between solving a regularized MDP and a projection.

10:   The inherent presence of regularized updates ensures last-iterate convergence, numerical stability,

11:   and, contrary to existing approaches, does not require us to regularize the problem explicitly.

12:   If the constraints are not attainable, we exploit salient properties of the Douglas-Rachord algorithm

13:   to detect infeasibility and compute a policy that minimally violates the constraints.

14:   We demonstrate the performance of our algorithm on two benchmark problems and show

15:   that it compares favorably to competing approaches.

16: \end{abstract}

17: