1: \begin{abstract}
2: We propose a data-driven framework to enable the modeling
3: and optimization of human-machine interaction processes,
4: e.g., systems aimed at assisting humans in decision-making
5: or learning, work-load allocation, and interactive advertising.
6: This is a challenging problem for several reasons.
7: First, humans' behavior is hard to model or
8: infer, as it may reflect biases, long term memory, and
9: sensitivity to sequencing, i.e., transience and exponential complexity
10: in the length of the interaction.
11: Second, due to the interactive nature of such processes,
12: the machine policy used to engage with human
13: may bias possible data-driven inferences.
14: Finally, in choosing machine policies that optimize interaction rewards,
15: one must, on the one hand, avoid being
16: overly sensitive to error/variability in the estimated
17: human model, and on the other, being overly
18: deterministic/predictable which may result in poor
19: human `engagement' in the interaction.
20: To meet these challenges, we propose a robust
21: approach, based on the maximum entropy principle, which
22: iteratively estimates human behavior and optimizes the machine
23: policy--Alternating Entropy-Reward Ascent (AREA) algorithm.
24: We characterize AREA, in terms of its space and time complexity and
25: convergence. We also provide an initial validation based on synthetic
26: data generated by an established noisy nonlinear model for
27: human decision-making.
28: \end{abstract}
29: