1: \begin{abstract}
2: In this paper, we present a novel algorithm named synchronous integral
3: Q-learning, which is based on synchronous policy iteration, to solve
4: the continuous-time infinite horizon optimal control problems
5: of input-affine system dynamics. The integral reinforcement is measured
6: as an excitation signal in this method to estimate the solution to the
7: Hamilton--Jacobi--Bellman equation. Moreover, the proposed method
8: is completely model-free, i.e. no\emph{ a priori} knowledge of the
9: system is required. Using policy iteration, the actor and
10: critic neural networks can simultaneously approximate the optimal value function
11: and policy. The persistence of excitation condition is required
12: to guarantee the convergence of the two networks. Unlike in
13: traditional policy iteration algorithms, the restriction of the initial admissible
14: policy is relaxed in this method. The effectiveness of the proposed
15: algorithm is verified through numerical simulations.
16: \end{abstract}
17: