23622922de9286db.tex
1: \begin{abstract}
2: This paper studies an infinite horizon optimal control problem for
3: discrete-time linear systems and quadratic criteria, both with random
4: parameters which are independent and identically distributed with
5: respect to time. A classical approach is to solve an algebraic Riccati
6: equation that involves mathematical expectations and requires certain
7: statistical information of the parameters. In this paper, we propose
8: an online iterative algorithm in the spirit of Q-learning for the
9: situation where only one random sample of parameters emerges at each time step. 
10: The first theorem proves the equivalence
11: of three properties: the convergence of the learning sequence, the
12: well-posedness of the control problem, and the solvability of the
13: algebraic Riccati equation. The second theorem shows that the adaptive
14: feedback control in terms of the learning sequence stabilizes the
15: system as long as the control problem is well-posed. Numerical examples
16: are presented to illustrate our results.
17: \end{abstract}