abstract:2b83686ca97b0ba6.tex

1: \begin{abstract}

2: In this paper, we present a novel algorithm named synchronous integral

3: Q-learning, which is based on synchronous policy iteration, to solve

4: the continuous-time infinite horizon optimal control problems

5: of input-affine system dynamics. The integral reinforcement is measured

6: as an excitation signal in this method to estimate the solution to the

7: Hamilton--Jacobi--Bellman equation. Moreover, the proposed method

8: is completely model-free, i.e. no\emph{ a priori} knowledge of the

9: system is required. Using policy iteration, the actor and

10: critic neural networks can simultaneously approximate the optimal value function

11: and policy. The persistence of excitation condition is required

12: to guarantee the convergence of the two networks. Unlike in

13: traditional policy iteration algorithms, the restriction of the initial admissible

14: policy is relaxed in this method. The effectiveness of the proposed

15: algorithm is verified through numerical simulations.

16: \end{abstract}

17: