0ddca07400dfdd38.tex
1: \begin{abstract}
2: We are motivated by the real challenges presented in a human-robot
3: system to develop new designs that are efficient at data level and
4: with performance guarantees such as stability and optimality at systems
5: level. Existing approximate/adaptive dynamic programming (ADP) results
6: that consider system performance theoretically are not readily providing
7: practically useful learning control algorithms for this problem; and
8: reinforcement learning (RL) algorithms that address the issue of data
9: efficiency usually do not have performance guarantees for the controlled
10: system. This study fills these important voids by introducing innovative
11: features to the policy iteration algorithm. We introduce flexible
12: policy iteration (FPI), which can flexibly and organically integrate
13: experience replay and supplemental values from prior experience into
14: the RL controller. We show system level performances including convergence
15: of the approximate value function, (sub)optimality of the solution,
16: and stability of the system. We demonstrate the effectiveness of the
17: FPI via realistic simulations of the human-robot system. It is noted
18: that the problem we face in this study may be difficult to address
19: by design methods based on classical control theory as it is nearly
20: impossible to obtain a customized mathematical model of a human-robot
21: system either online or offline. The results we have obtained also
22: indicate the great potential of RL control to solving realistic and
23: challenging problems with high dimensional control inputs. 
24: \end{abstract}
25: