18ddd987920a4d72.tex
1: \begin{abstract}
2: 	In this paper, we study the Temporal Difference (TD) learning with linear 
3: 	value function approximation.
4: 	%
5: 	It is well known that most TD learning algorithms are unstable with 
6: 	linear function approximation and off-policy learning. 
7: 	%
8: 	Recent development of \emph{Gradient TD} (GTD)
9: 	algorithms has addressed this problem successfully.
10: 	%
11: 	However, the success of GTD algorithms requires a set of
12: 	well chosen features, which are not always available.
13: 	%
14: 	When the number of features is huge, the GTD algorithms might
15: 	face the problem of overfitting and being computationally expensive.
16: 	%
17: 	To cope with this difficulty, regularization techniques, in 
18: 	particular $\ell_{1}$ regularization, have attracted significant attentions
19: 	in developing TD learning algorithms.
20: 	%
21: 	The present work combines the GTD algorithms with $\ell_{1}$ regularization.
22: %	, which is known to be a simple effective mechanism for automatic feature selection.
23: 	%
24: 	We propose a family of $\ell_{1}$ regularized GTD algorithms, which employ
25: 	the well known soft thresholding operator.
26: 	%
27: 	We investigate convergence properties of the proposed algorithms, and depict  
28: 	their performance with several numerical experiments. \vspace{2mm}
29: \end{abstract}
30: