1: \begin{abstract}
2: We study derivative-free methods for policy optimization over the
3: class of linear policies. We focus on characterizing the convergence
4: rate of these methods when applied to linear-quadratic systems, and
5: study various settings of driving noise and reward feedback. We show
6: that these methods provably converge to within any pre-specified
7: tolerance of the optimal policy with a number of zero-order
8: evaluations that is an explicit polynomial of the error tolerance,
9: dimension, and curvature properties of the problem. Our analysis
10: reveals some interesting differences between the settings of additive
11: driving noise and random initialization, as well as the settings of
12: one-point and two-point reward feedback. Our theory is corroborated
13: by extensive simulations of derivative-free methods on these
14: systems. Along the way, we derive convergence rates for stochastic
15: zero-order optimization algorithms when applied to a certain class of
16: non-convex problems.
17: \end{abstract}
18: