abstract:6dab77e7ad2f617a.tex

1: \begin{abstract}

2:

3: The  Zap~Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects.   It is a matrix-gain algorithm designed so that its asymptotic variance is optimal.   Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a  two time-scale update equation for the matrix gain sequence.

4:

5: The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.     The comparison plot on this first page, taken from \Fig{6stateBEPlot} of this paper,  is an illustration of the amazing acceleration in convergence using the new algorithm.

6:

7: A secondary goal of this paper is tutorial.   The first half of the paper contains a survey on  reinforcement learning algorithms, with a focus on minimum variance algorithms.

8:

9:

10: \medskip

11:

12: {\small

13: 	\noindent

14: 	\textbf{Keywords:}

15: 	Reinforcement learning,

16: 	Q-learning,

17: 	Stochastic optimal control}

18: \smallskip

19:

20: {\small

21: 	\noindent

22: 	\textbf{2000 AMS Subject Classification:}

23: 	93E20,	%  	Optimal stochastic control

24: 	93E35	%  	Stochastic learning and adaptive control

25: 	%60J20  	%Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) [See also 90B30, 91D10, 91D35, 91E40]

26: 	%60J22  	%Computational methods in Markov chains [See also 65C40]

27:

28: 	% 60J10,          %  chains with discrete parameter

29: 	%60J25,          % Markov processes with continuous parameter

30: 	%37A30,          % Ergodic theorems, spectral theory, Markov operators

31: 	% 60F10,          % Large deviations

32: 	%47H99.          % nonlinear operators

33: }

34:

35:

36:

37: \vfill

38:

39: 			\Ebox{1}{6State_BEPlot_Beta08099.pdf}

40: \vfill

41:

42: \end{abstract}

43: