21807f4de64007b4.tex
1: \begin{abstract}
2: The optimization landscape of optimal control problems plays an important role in the convergence of many policy gradient methods. Unlike state-feedback Linear Quadratic Regulator (LQR), static output-feedback policies are typically insufficient to achieve good closed-loop control performance. We investigate the optimization landscape of linear quadratic control using dynamic output-feedback policies, denoted as dynamic LQR (\texttt{dLQR}) in this paper. We first show that the \texttt{dLQR} cost varies with similarity transformations.  We then derive an explicit form of the optimal similarity transformation for a given observable stabilizing controller. We further characterize the unique observable stationary point of \texttt{dLQR}. This provides an optimality certificate for policy gradient methods under mild assumptions. Finally, we discuss the differences and connections between \texttt{dLQR} and the canonical linear quadratic Gaussian (LQG) control. These results shed light on designing policy gradient algorithms for  decision-making problems with partially observed information.
3: \end{abstract}
4: