ee09a406ff2ccd94.tex
1: \begin{abstract}%
2: % Monte Carlo search can return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples.
3: % Monte Carlo search accuracy is typically sensitive to the number of simulations conducted. 
4: % Monte Carlo methods require a sufficient number of samples be drawn to produce accurate estimates.
5: % The accuracy of Monte Carlo methods tends to depend upon the number of samples generated. 
6: The theoretical asymptotic bounds provided for many Monte Carlo methods cannot be calculated using known quantities during search. 
7: Often search results provide no principled measure of confidence and may be sub-optimal due to premature termination.
8: % As a result, Monte Carlo search can often return sub-optimal recommendations as a result of premature termination.
9: % Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion of search.
10: In this work, we prove two sets of bounds on the error rate of Monte Carlo search over non-stationary bandits and Markov decision processes. 
11: The presented bounds hold for general Monte Carlo solvers meeting mild convergence conditions.
12: These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value, allowing them to be used as search stopping criteria. 
13: We also provide a simple sub-optimality probability estimation method based on the presented bounds. 
14: We empirically test the tightness of the bounds and accuracy of the estimator through experiments on a multi-armed bandit and a discrete Markov decision process.
15: % For each task, we evaluate a simple Monte Carlo solver and Monte Carlo tree search.
16: \end{abstract}
17: