6dab77e7ad2f617a.tex
1: \begin{abstract}   
2: 
3: The  Zap~Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects.   It is a matrix-gain algorithm designed so that its asymptotic variance is optimal.   Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a  two time-scale update equation for the matrix gain sequence. 
4: 
5: The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.     The comparison plot on this first page, taken from \Fig{6stateBEPlot} of this paper,  is an illustration of the amazing acceleration in convergence using the new algorithm. 
6: 
7: A secondary goal of this paper is tutorial.   The first half of the paper contains a survey on  reinforcement learning algorithms, with a focus on minimum variance algorithms.     
8: 
9: 
10: \medskip
11: 
12: {\small
13: 	\noindent
14: 	\textbf{Keywords:}  
15: 	Reinforcement learning,  
16: 	Q-learning,
17: 	Stochastic optimal control}
18: \smallskip
19: 
20: {\small
21: 	\noindent
22: 	\textbf{2000 AMS Subject Classification:}
23: 	93E20,	%  	Optimal stochastic control
24: 	93E35	%  	Stochastic learning and adaptive control
25: 	%60J20  	%Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) [See also 90B30, 91D10, 91D35, 91E40]
26: 	%60J22  	%Computational methods in Markov chains [See also 65C40]
27: 	
28: 	% 60J10,          %  chains with discrete parameter
29: 	%60J25,          % Markov processes with continuous parameter
30: 	%37A30,          % Ergodic theorems, spectral theory, Markov operators
31: 	% 60F10,          % Large deviations
32: 	%47H99.          % nonlinear operators
33: }
34: 
35: 
36: 
37: \vfill
38: 
39: 			\Ebox{1}{6State_BEPlot_Beta08099.pdf}
40: \vfill
41: 		
42: \end{abstract}
43: