1: \begin{abstract}
2:
3: An intelligent agent performs actions in order to achieve its goals. Such
4: actions can either be externally directed, such as opening a door, or
5: internally directed, such as writing data to a memory location or strengthening
6: a synaptic connection. Some internal actions, to which we refer as
7: computations, potentially help the agent choose better actions. Considering
8: that (external) actions and computations might draw upon the same resources,
9: such as time and energy, deciding when to act or compute, as well as what to
10: compute, are detrimental to the performance of an agent.
11:
12: In an environment that provides rewards depending on an agent's behavior, an
13: action's value is typically defined as the sum of expected long-term rewards
14: succeeding the action (itself a complex quantity that depends on what the agent
15: goes on to do after the action in question). However, defining the value of a
16: computation is not as straightforward, as computations are only valuable in a
17: higher order way, through the alteration of actions.
18:
19: This thesis offers a principled way of computing the value of a computation in
20: a planning setting formalized as a Markov decision process. We present two
21: different definitions of computation values: static and dynamic. They address
22: two extreme cases of the computation budget: affording calculation of zero or
23: infinitely many steps in the future. We show that these values have desirable
24: properties, such as temporal consistency and asymptotic convergence.
25:
26: Furthermore, we propose methods for efficiently computing and approximating the
27: static and dynamic computation values. We describe a sense in which the
28: policies that greedily maximize these values can be optimal. Furthermore, we
29: utilize these principles to construct Monte Carlo tree search algorithms that
30: outperform most of the state-of-the-art in terms of finding higher quality
31: actions given the same simulation resources.
32:
33: \end{abstract}
34: