1: \begin{abstract}
2: Learning policies in an asynchronous parallel way is essential to the numerous successes of RL for solving large-scale problems. However, their convergence performance is still not rigorously evaluated. To this end, we adopt the asynchronous parallel zero-order policy gradient (AZOPG) method to solve the continuous-time linear quadratic regulation problem. Specifically, multiple workers independently perform system rollouts to estimate zero-order PGs which are then aggregated in a master for policy updates. As in the celebrated A3C algorithm, each worker is allowed to interact with the master {\em asynchronously}. By quantifying the convergence rate of the AZOPG, we show its linear speedup property, both in theory and simulation, which reveals the advantages of using asynchronous parallel workers in learning policies.
3: \end{abstract}
4: