20c53437e697b142.tex
1: \begin{abstract}
2: In this paper we propose a novel algorithm, factored value
3: iteration (FVI), for the approximate solution of factored Markov
4: decision processes (fMDPs). The traditional approximate value
5: iteration algorithm is modified in two ways. For one, the
6: least-squares projection operator is modified so that it does not
7: increase max-norm, and thus preserves convergence. The other
8: modification is that we uniformly sample polynomially many samples
9: from the (exponentially large) state space. This way, the
10: complexity of our algorithm becomes polynomial in the size of the
11: fMDP description length. We prove that the algorithm is
12: convergent. We also derive an upper bound on the difference
13: between our approximate solution and the optimal one, and also on
14: the error introduced by sampling. We analyze various projection
15: operators with respect to their computation complexity and their
16: convergence when combined with approximate value iteration.
17: 
18: \keywords{factored Markov decision process, value iteration,
19: reinforcement learning}
20: \end{abstract}
21: