abstract:6e513daea645a208.tex

1: \begin{abstract}

2:

3: Convex Q-learning is a recent approach to reinforcement learning, motivated by the possibility of a firmer theory for convergence,  and the possibility of making use of greater a~priori knowledge regarding policy or value function structure.     This paper explores algorithm design in the continuous time domain,   with  finite-horizon optimal control objective. The main contributions are

4: \begin{romannum}

5: 			\item

6: 			Algorithm design is based on a new  \textit{Q-ODE},   which defines the model-free characterization of the Hamilton-Jacobi-Bellman equation.

7:

8: 			\item

9: 			The Q-ODE motivates a new formulation of Convex Q-learning that avoids the approximations appearing in prior work.

10: The Bellman error used in the algorithm is defined by filtered measurements,  which is beneficial in the presence of measurement noise.

11:

12:

13: 			\item

14: 			A characterization of boundedness of the constraint region is obtained through a non-trivial extension of recent results from the discrete time setting.

15:

16: 			\item

17: 			The theory is illustrated in application to resource allocation for distributed energy resources, for which the theory is ideally suited.

18: 		\end{romannum}

19:

20: 	\end{abstract}

21: