abstract:ee09a406ff2ccd94.tex

1: \begin{abstract}%

2: % Monte Carlo search can return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples.

3: % Monte Carlo search accuracy is typically sensitive to the number of simulations conducted.

4: % Monte Carlo methods require a sufficient number of samples be drawn to produce accurate estimates.

5: % The accuracy of Monte Carlo methods tends to depend upon the number of samples generated.

6: The theoretical asymptotic bounds provided for many Monte Carlo methods cannot be calculated using known quantities during search.

7: Often search results provide no principled measure of confidence and may be sub-optimal due to premature termination.

8: % As a result, Monte Carlo search can often return sub-optimal recommendations as a result of premature termination.

9: % Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion of search.

10: In this work, we prove two sets of bounds on the error rate of Monte Carlo search over non-stationary bandits and Markov decision processes.

11: The presented bounds hold for general Monte Carlo solvers meeting mild convergence conditions.

12: These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value, allowing them to be used as search stopping criteria.

13: We also provide a simple sub-optimality probability estimation method based on the presented bounds.

14: We empirically test the tightness of the bounds and accuracy of the estimator through experiments on a multi-armed bandit and a discrete Markov decision process.

15: % For each task, we evaluate a simple Monte Carlo solver and Monte Carlo tree search.

16: \end{abstract}

17: