1: \begin{abstract}
2: We propose a policy iteration algorithm for solving the multiplicative noise linear quadratic output feedback design problem. The algorithm solves a set of coupled Riccati equations for estimation and control arising from a partially observable Markov decision process (POMDP) under a class of linear dynamic control policies.
3: We show in numerical experiments far faster convergence than a value iteration algorithm, formerly the only known algorithm for solving this class of problem.
4: The results suggest promising future research directions for policy optimization algorithms in more general POMDPs, including the potential to develop novel approximate data-driven approaches when model parameters are not available.
5: \end{abstract}
6: