abstract:bd7aa9015191cf1e.tex

1: \begin{abstract}

2:  We study derivative-free methods for policy optimization over the

3:  class of linear policies. We focus on characterizing the convergence

4:  rate of these methods when applied to linear-quadratic systems, and

5:  study various settings of driving noise and reward feedback. We show

6:  that these methods provably converge to within any pre-specified

7:  tolerance of the optimal policy with a number of zero-order

8:  evaluations that is an explicit polynomial of the error tolerance,

9:  dimension, and curvature properties of the problem. Our analysis

10:  reveals some interesting differences between the settings of additive

11:  driving noise and random initialization, as well as the settings of

12:  one-point and two-point reward feedback. Our theory is corroborated

13:  by extensive simulations of derivative-free methods on these

14:  systems. Along the way, we derive convergence rates for stochastic

15:  zero-order optimization algorithms when applied to a certain class of

16:  non-convex problems.

17: \end{abstract}

18: