1: \begin{abstract}
2: We consider the problem of control in an off-policy reinforcement learning (RL) context. We propose a policy gradient scheme that incorporates a smoothed functional-based gradient estimation scheme. We provide an asymptotic convergence guarantee for the proposed algorithm using the ordinary differential equation (ODE) approach. Further, we derive a non-asymptotic bound that quantifies the rate of convergence of the proposed algorithm.
3: \end{abstract}
4: