16e7e5187e028849.tex
1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file
2: We propose a single timescale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor. We give a proof of convergence with sample complexity $\mO(\ve^{-1} \log(\ve^{-1})^2)$. The method in the proof is applicable to general single timescale bilevel optimization problems. We also numerically validate our theoretical results on the convergence.
3: \end{abstract}
4: