abstract:20c53437e697b142.tex

1: \begin{abstract}

2: In this paper we propose a novel algorithm, factored value

3: iteration (FVI), for the approximate solution of factored Markov

4: decision processes (fMDPs). The traditional approximate value

5: iteration algorithm is modified in two ways. For one, the

6: least-squares projection operator is modified so that it does not

7: increase max-norm, and thus preserves convergence. The other

8: modification is that we uniformly sample polynomially many samples

9: from the (exponentially large) state space. This way, the

10: complexity of our algorithm becomes polynomial in the size of the

11: fMDP description length. We prove that the algorithm is

12: convergent. We also derive an upper bound on the difference

13: between our approximate solution and the optimal one, and also on

14: the error introduced by sampling. We analyze various projection

15: operators with respect to their computation complexity and their

16: convergence when combined with approximate value iteration.

17:

18: \keywords{factored Markov decision process, value iteration,

19: reinforcement learning}

20: \end{abstract}

21: