abstract:496aab5c2d9a6557.tex

1: \begin{abstract}

2:     Model predictive control can optimally deal with nonlinear systems under consideration of constraints.

3:     %

4:     The control performance depends on the model accuracy and the prediction horizon.

5:     %

6:     %

7:     Recent advances propose to use reinforcement learning applied to a parameterized model predictive controller to recover the optimal control performance even if an imperfect model or short prediction horizons are used.

8:     However, common reinforcement learning algorithms rely on first order updates, which only have a linear convergence rate and hence need an excessive amount of dynamic data.

9:     %

10:     %

11:     Higher order updates are typically intractable if the policy is approximated with neural networks due to the large number of parameters.

12:     %

13:

14:     %

15:     In this work, we use a parameterized model predictive controller as policy, and leverage the small amount of necessary parameters to propose a trust-region constrained Quasi-Newton training algorithm for policy optimization with a superlinear convergence rate.

16:     %

17:     %

18:     We show that the required second order derivative information can be calculated by the solution of a linear system of equations.

19:     %

20:     A simulation study illustrates that the proposed training algorithm outperforms other algorithms in terms of data efficiency and accuracy.

21:     \blfootnote{The authors are with the chair of Process Automation Systems at the department of Biochemical and Chemical Engineering, TU Dortmund University, 44227 Dortmund, Germany (e-mail: dean.brandner@tu-dortmund.de; sergio.lucia@tu-dortmund.de). \\ This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 466380688 – within the Priority Program “SPP 2331: Machine Learning in Chemical Engineering”.}

22: \end{abstract}

23: