c2f1d9d745217220.tex
1: \begin{abstract}
2: 	By using an parametric value function to replace the Monte-Carlo rollouts for value estimation, the actor-critic (AC) algorithms can reduce the variance of stochastic policy gradient so that to improve the convergence rate. 
3: 	While existing works mainly focus on analyzing convergence rate of AC algorithms under Markovian noise, the impacts of momentum on AC algorithms remain largely unexplored. 
4: 	In this work, we first propose a heavy-ball momentum based advantage actor-critic (\mbox{HB-A2C}) algorithm by integrating the heavy-ball momentum into the critic recursion that is parameterized by a linear function. 
5: 	When the sample trajectory follows a Markov decision process, we quantitatively certify the acceleration capability of the proposed HB-A2C algorithm.
6: 	Our theoretical results demonstrate that the proposed HB-A2C finds an $\epsilon$-approximate stationary point with $\oo{\epsilon^{-2}}$ iterations for reinforcement learning tasks with Markovian noise. 
7: 	Moreover, we also reveal the dependence of learning rates on the length of the sample trajectory.  
8: 	By carefully selecting the momentum factor of the critic recursion, the proposed HB-A2C can balance the errors introduced by the initialization and the stoschastic approximation. 
9: \end{abstract}