1: \begin{abstract}
2: This article introduces a novel framework for data-driven linear quadratic regulator (LQR) design.
3: First, we introduce a reinforcement learning paradigm for on-policy data-driven LQR, where exploration and exploitation are simultaneously performed while guaranteeing robust stability of the whole closed-loop system encompassing the plant and the control/learning dynamics.
4: Then, we propose Model Reference Adaptive Reinforcement Learning (MR-ARL), a control architecture integrating tools from reinforcement learning and model reference adaptive control.
5: The approach stands on a variable reference model containing the currently identified value function.
6: Then, an adaptive stabilizer is used to ensure convergence of the applied policy to the optimal one, convergence of the plant to the optimal reference model, and overall robust closed-loop stability.
7: The proposed framework provides theoretical robustness certificates against real-world perturbations such as measurement noise, plant nonlinearities, or slowly varying parameters.
8: The effectiveness of the proposed architecture is validated via realistic numerical simulations.
9: \end{abstract}
10: