d4eb11d4634f73e0.tex
1: \begin{abstract}
2: \todo[inline]{if we want to have an abstract written differently, we can consider this one}
3: In this paper, we consider the constrained Markov decision process, where an agent aims to maximize the expected accumulated discounted reward subject to a relatively small number of constraints on its costs.
4: We propose a new dual approach based on two ingredients: i) entropy-regularized policy optimizer and ii) Vaidya’s dual optimizer.
5: We show how these two crucial techniques can be combined to achieve faster convergence. We provide the finite-time error bound for our method. Even though the objective and constraints are nonconcave, 
6: we show that our proposed approach converges to the global optimum with a linear rate.
7: The iteration complexity results (expressed in terms of the optimality gap, and the constraint violation) significantly improve upon the existing primal-dual approaches.
8: \end{abstract}
9: