abstract:cdc851c70dfa4427.tex

1: \begin{abstract}

2:

3: An intelligent agent performs actions in order to achieve its goals. Such

4: actions can either be externally directed, such as opening a door, or

5: internally directed, such as writing data to a memory location or strengthening

6: a synaptic connection. Some internal actions, to which we refer as

7: computations, potentially help the agent choose better actions. Considering

8: that (external) actions and computations might draw upon the same resources,

9: such as time and energy, deciding when to act or compute, as well as what to

10: compute, are detrimental to the performance of an agent.

11:

12: In an environment that provides rewards depending on an agent's behavior, an

13: action's value is typically defined as the sum of expected long-term rewards

14: succeeding the action (itself a complex quantity that depends on what the agent

15: goes on to do after the action in question). However, defining the value of a

16: computation is not as straightforward, as computations are only valuable in a

17: higher order way, through the alteration of actions.

18:

19: This thesis offers a principled way of computing the value of a computation in

20: a planning setting formalized as a Markov decision process. We present two

21: different definitions of computation values: static and dynamic. They address

22: two extreme cases of the computation budget: affording calculation of zero or

23: infinitely many steps in the future. We show that these values have desirable

24: properties, such as temporal consistency and asymptotic convergence.

25:

26: Furthermore, we propose methods for efficiently computing and approximating the

27: static and dynamic computation values. We describe a sense in which the

28: policies that greedily maximize these values can be optimal. Furthermore, we

29: utilize these principles to construct Monte Carlo tree search algorithms that

30: outperform most of the state-of-the-art in terms of finding higher quality

31: actions given the same simulation resources.

32:

33: \end{abstract}

34: