ee932c395a86ff47.tex
1: \begin{abstract}
2: There is increasing interest in using streaming data to inform 
3: decision making across a wide range of application domains including
4: mobile health, food safety, security, and resource management.   A
5: decision support system formalizes online decision making as a map
6: from up-to-date information to a recommended decision.   Online estimation
7: of an optimal decision strategy from streaming data requires
8: simultaneous estimation of components of the underlying system
9: dynamics as well as the optimal decision strategy given these dynamics; thus, 
10: there is an inherent trade-off between choosing decisions that lead to 
11: improved estimates and  choosing decisions that appear to be
12: optimal based on current estimates.   Thompson (1933) was 
13: among the first to formalize this trade-off in the context of choosing
14: between two treatments for a stream of patients; he proposed a simple
15: heuristic wherein a treatment is selected randomly at each time point with selection
16: probability proportional to the posterior probability that it is
17: optimal.  We consider a variant of Thompson sampling that is simple
18: to implement and can be
19: applied to large and complex decision problems.  We
20: show that the proposed Thompson sampling estimator is 
21: consistent for the optimal decision support system 
22: and provide rates of convergence and finite sample error bounds.  
23: The proposed algorithm is illustrated using an agent-based model 
24: of the spread of influenza on a network and management of mallard populations
25: in the United States.  
26: \end{abstract}