abstract:a23430ed934189a7.tex

1: \begin{abstract}

2: Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a number of fields, from artificial intelligence and robotics, to medicine and finance.

3: This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function.

4: We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data.

5: The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees.

6: Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.

7: \end{abstract}

8: