1: \begin{abstract}
2: We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision
3: Processes (MDPs) and Partially Observable MDPs (POMDPs), and the
4: well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a
5: tree with nodes (states) and edges (actions) is incrementally built by
6: the expansion of nodes, and the values of nodes are updated through a
7: backup strategy based on the average value of child nodes. However, it
8: has been shown that with enough samples the maximum operator yields
9: more accurate node value estimates than averaging. Instead of settling for one of these value
10: estimates, we go a step
11: further proposing a novel backup strategy which uses the power mean
12: operator, which computes a value between the average
13: and maximum value. We call our new approach \alg, and argue how the
14: use of the power mean operator helps to speed up the learning in
15: MCTS. We theoretically analyze our method providing guarantees of
16: convergence to the optimum. Finally, we
17: empirically demonstrate the effectiveness of our method in well-known
18: MDP and POMDP benchmarks, showing significant improvement in
19: performance and convergence speed w.r.t. state of the art algorithms.
20: \end{abstract}
21: