78f3f7231acc0ada.tex
1: \begin{abstract}
2: We propose universal randomized function approximation-based
3: empirical value learning (EVL) algorithms for Markov decision processes.
4: The `empirical' nature comes from each iteration being done empirically
5: from samples available from simulations of the next state. This makes
6: the Bellman operator a random operator. A parametric and a non-parametric
7: method for function approximation using a parametric function space
8: and a Reproducing Kernal Hilbert Space (RKHS) respectively are then
9: combined with EVL. Both function spaces have the universal function
10: approximation property. Basis functions are picked randomly. Convergence
11: analysis is done using a random operator framework with techniques
12: from the theory of stochastic dominance. Finite time sample complexity
13: bounds are derived for both universal approximate dynamic programming
14: algorithms. Numerical experiments support the versatility and {computational tractability}
15: of this approach. 
16: \end{abstract}
17: