6e513daea645a208.tex
1: \begin{abstract}
2: 
3: Convex Q-learning is a recent approach to reinforcement learning, motivated by the possibility of a firmer theory for convergence,  and the possibility of making use of greater a~priori knowledge regarding policy or value function structure.     This paper explores algorithm design in the continuous time domain,   with  finite-horizon optimal control objective. The main contributions are  		
4: \begin{romannum}
5: 			\item
6: 			Algorithm design is based on a new  \textit{Q-ODE},   which defines the model-free characterization of the Hamilton-Jacobi-Bellman equation.
7: 			
8: 			\item
9: 			The Q-ODE motivates a new formulation of Convex Q-learning that avoids the approximations appearing in prior work.     
10: The Bellman error used in the algorithm is defined by filtered measurements,  which is beneficial in the presence of measurement noise.
11: 			
12: 			
13: 			\item  
14: 			A characterization of boundedness of the constraint region is obtained through a non-trivial extension of recent results from the discrete time setting.
15: 			
16: 			\item   
17: 			The theory is illustrated in application to resource allocation for distributed energy resources, for which the theory is ideally suited.    
18: 		\end{romannum}
19: 		
20: 	\end{abstract}
21: