97b5dac2e93fd77d.tex
1: \begin{abstract}
2: In the paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning
3: based on Bregman divergences and momentum techniques.
4: Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration.
5: Meanwhile, we further propose an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on the variance reduced technique.
6: Moreover, we provide a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting.
7: We prove that our BGPO achieves a  sample complexity of $O(\epsilon^{-4})$ for finding $\epsilon$-stationary policy
8: only requiring one trajectory at each iteration,
9: and our VR-BGPO reaches the best known sample complexity of $O(\epsilon^{-3})$,
10: which also only requires one trajectory at each iteration.
11: In particular, by using different Bregman divergences, our BGPO framework unifies many existing policy optimization algorithms such as
12: the existing (variance reduced) policy gradient algorithms such as natural policy gradient algorithm.
13: Extensive experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithms.
14: \end{abstract}
15: