2b83686ca97b0ba6.tex
1: \begin{abstract}
2: In this paper, we present a novel algorithm named synchronous integral 
3: Q-learning, which is based on synchronous policy iteration, to solve 
4: the continuous-time infinite horizon optimal control problems 
5: of input-affine system dynamics. The integral reinforcement is measured 
6: as an excitation signal in this method to estimate the solution to the 
7: Hamilton--Jacobi--Bellman equation. Moreover, the proposed method 
8: is completely model-free, i.e. no\emph{ a priori} knowledge of the 
9: system is required. Using policy iteration, the actor and 
10: critic neural networks can simultaneously approximate the optimal value function 
11: and policy. The persistence of excitation condition is required 
12: to guarantee the convergence of the two networks. Unlike in 
13: traditional policy iteration algorithms, the restriction of the initial admissible 
14: policy is relaxed in this method. The effectiveness of the proposed 
15: algorithm is verified through numerical simulations.
16: \end{abstract}
17: