abstract:0ddca07400dfdd38.tex

1: \begin{abstract}

2: We are motivated by the real challenges presented in a human-robot

3: system to develop new designs that are efficient at data level and

4: with performance guarantees such as stability and optimality at systems

5: level. Existing approximate/adaptive dynamic programming (ADP) results

6: that consider system performance theoretically are not readily providing

7: practically useful learning control algorithms for this problem; and

8: reinforcement learning (RL) algorithms that address the issue of data

9: efficiency usually do not have performance guarantees for the controlled

10: system. This study fills these important voids by introducing innovative

11: features to the policy iteration algorithm. We introduce flexible

12: policy iteration (FPI), which can flexibly and organically integrate

13: experience replay and supplemental values from prior experience into

14: the RL controller. We show system level performances including convergence

15: of the approximate value function, (sub)optimality of the solution,

16: and stability of the system. We demonstrate the effectiveness of the

17: FPI via realistic simulations of the human-robot system. It is noted

18: that the problem we face in this study may be difficult to address

19: by design methods based on classical control theory as it is nearly

20: impossible to obtain a customized mathematical model of a human-robot

21: system either online or offline. The results we have obtained also

22: indicate the great potential of RL control to solving realistic and

23: challenging problems with high dimensional control inputs.

24: \end{abstract}

25: