a23430ed934189a7.tex
1: \begin{abstract}
2: Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a number of fields, from artificial intelligence and robotics, to medicine and finance.
3: This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function.
4: We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data.
5: The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees.
6: Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.
7: \end{abstract}
8: