abstract:ee932c395a86ff47.tex

1: \begin{abstract}

2: There is increasing interest in using streaming data to inform

3: decision making across a wide range of application domains including

4: mobile health, food safety, security, and resource management.   A

5: decision support system formalizes online decision making as a map

6: from up-to-date information to a recommended decision.   Online estimation

7: of an optimal decision strategy from streaming data requires

8: simultaneous estimation of components of the underlying system

9: dynamics as well as the optimal decision strategy given these dynamics; thus,

10: there is an inherent trade-off between choosing decisions that lead to

11: improved estimates and  choosing decisions that appear to be

12: optimal based on current estimates.   Thompson (1933) was

13: among the first to formalize this trade-off in the context of choosing

14: between two treatments for a stream of patients; he proposed a simple

15: heuristic wherein a treatment is selected randomly at each time point with selection

16: probability proportional to the posterior probability that it is

17: optimal.  We consider a variant of Thompson sampling that is simple

18: to implement and can be

19: applied to large and complex decision problems.  We

20: show that the proposed Thompson sampling estimator is

21: consistent for the optimal decision support system

22: and provide rates of convergence and finite sample error bounds.

23: The proposed algorithm is illustrated using an agent-based model

24: of the spread of influenza on a network and management of mallard populations

25: in the United States.

26: \end{abstract}