abstract:d4eb11d4634f73e0.tex

1: \begin{abstract}

2: \todo[inline]{if we want to have an abstract written differently, we can consider this one}

3: In this paper, we consider the constrained Markov decision process, where an agent aims to maximize the expected accumulated discounted reward subject to a relatively small number of constraints on its costs.

4: We propose a new dual approach based on two ingredients: i) entropy-regularized policy optimizer and ii) Vaidya’s dual optimizer.

5: We show how these two crucial techniques can be combined to achieve faster convergence. We provide the finite-time error bound for our method. Even though the objective and constraints are nonconcave,

6: we show that our proposed approach converges to the global optimum with a linear rate.

7: The iteration complexity results (expressed in terms of the optimality gap, and the constraint violation) significantly improve upon the existing primal-dual approaches.

8: \end{abstract}

9: