1: \begin{abstract}
2: There is increasing interest in using streaming data to inform
3: decision making across a wide range of application domains including
4: mobile health, food safety, security, and resource management. A
5: decision support system formalizes online decision making as a map
6: from up-to-date information to a recommended decision. Online estimation
7: of an optimal decision strategy from streaming data requires
8: simultaneous estimation of components of the underlying system
9: dynamics as well as the optimal decision strategy given these dynamics; thus,
10: there is an inherent trade-off between choosing decisions that lead to
11: improved estimates and choosing decisions that appear to be
12: optimal based on current estimates. Thompson (1933) was
13: among the first to formalize this trade-off in the context of choosing
14: between two treatments for a stream of patients; he proposed a simple
15: heuristic wherein a treatment is selected randomly at each time point with selection
16: probability proportional to the posterior probability that it is
17: optimal. We consider a variant of Thompson sampling that is simple
18: to implement and can be
19: applied to large and complex decision problems. We
20: show that the proposed Thompson sampling estimator is
21: consistent for the optimal decision support system
22: and provide rates of convergence and finite sample error bounds.
23: The proposed algorithm is illustrated using an agent-based model
24: of the spread of influenza on a network and management of mallard populations
25: in the United States.
26: \end{abstract}