bd7aa9015191cf1e.tex
1: \begin{abstract}
2:  We study derivative-free methods for policy optimization over the
3:  class of linear policies. We focus on characterizing the convergence
4:  rate of these methods when applied to linear-quadratic systems, and
5:  study various settings of driving noise and reward feedback. We show
6:  that these methods provably converge to within any pre-specified
7:  tolerance of the optimal policy with a number of zero-order
8:  evaluations that is an explicit polynomial of the error tolerance,
9:  dimension, and curvature properties of the problem. Our analysis
10:  reveals some interesting differences between the settings of additive
11:  driving noise and random initialization, as well as the settings of
12:  one-point and two-point reward feedback. Our theory is corroborated
13:  by extensive simulations of derivative-free methods on these
14:  systems. Along the way, we derive convergence rates for stochastic
15:  zero-order optimization algorithms when applied to a certain class of
16:  non-convex problems.
17: \end{abstract}
18: