abstract:a0080fbf124fe6b3.tex

1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file

2: We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.

3: Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method.

4: %By extending a recent analysis of PMD in the tabular setting,

5: We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexities

6: %for both NPG and Q-NPG with log-linear policy parametrization

7: using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization.

8: Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.

9: %unconstrained constant step sizes.

10:

11: \paragraph{keywords}

12:   discounted Markov decision process, natural policy gradient, policy mirror descent, log-linear policy, sample complexity.

13: \end{abstract}

14: