ebbb8266a05f6240.tex
1: \begin{abstract}%
2: In this paper, we revisit model-free policy search on 
3: an important robust control benchmark, namely  \(\mu\) synthesis. In the general output-feedback setting, there do not exist convex formulations for this problem, and hence global optimality guarantees are not expected. \citet{apkarian2011nonsmooth} presented a nonconvex nonsmooth policy optimization approach for this problem, and achieved state-of-the-art design results via using subgradient-based policy search algorithms which generate update directions in a model-based manner. Despite the lack of convexity and global optimality guarantees, these subgradient-based policy search methods have led to impressive numerical results in practice.
4: Built upon such a policy optimization persepctive, our paper extends these subgradient-based search methods to a model-free setting. Specifically, we examine the effectiveness of two model-free policy optimization strategies:  the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing.  We performed an extensive numerical study to demonstrate that both methods consistently replicate the design outcomes achieved by their model-based counterparts. Additionally, we provide some theoretical justifications showing that convergence guarantees to stationary points can be established for our model-free $\mu$-synthesis under some assumptions related to the coerciveness of the cost function. 
5: Overall, our results demonstrate that derivative-free policy optimization offers a competitive and viable approach for solving general output-feedback \(\mu\)-synthesis problems in the model-free setting.
6: 
7: 
8: \end{abstract}
9: