abstract:97b5dac2e93fd77d.tex

1: \begin{abstract}

2: In the paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning

3: based on Bregman divergences and momentum techniques.

4: Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration.

5: Meanwhile, we further propose an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on the variance reduced technique.

6: Moreover, we provide a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting.

7: We prove that our BGPO achieves a  sample complexity of $O(\epsilon^{-4})$ for finding $\epsilon$-stationary policy

8: only requiring one trajectory at each iteration,

9: and our VR-BGPO reaches the best known sample complexity of $O(\epsilon^{-3})$,

10: which also only requires one trajectory at each iteration.

11: In particular, by using different Bregman divergences, our BGPO framework unifies many existing policy optimization algorithms such as

12: the existing (variance reduced) policy gradient algorithms such as natural policy gradient algorithm.

13: Extensive experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithms.

14: \end{abstract}

15: