abstract:18ddd987920a4d72.tex

1: \begin{abstract}

2: 	In this paper, we study the Temporal Difference (TD) learning with linear

3: 	value function approximation.

4: 	%

5: 	It is well known that most TD learning algorithms are unstable with

6: 	linear function approximation and off-policy learning.

7: 	%

8: 	Recent development of \emph{Gradient TD} (GTD)

9: 	algorithms has addressed this problem successfully.

10: 	%

11: 	However, the success of GTD algorithms requires a set of

12: 	well chosen features, which are not always available.

13: 	%

14: 	When the number of features is huge, the GTD algorithms might

15: 	face the problem of overfitting and being computationally expensive.

16: 	%

17: 	To cope with this difficulty, regularization techniques, in

18: 	particular $\ell_{1}$ regularization, have attracted significant attentions

19: 	in developing TD learning algorithms.

20: 	%

21: 	The present work combines the GTD algorithms with $\ell_{1}$ regularization.

22: %	, which is known to be a simple effective mechanism for automatic feature selection.

23: 	%

24: 	We propose a family of $\ell_{1}$ regularized GTD algorithms, which employ

25: 	the well known soft thresholding operator.

26: 	%

27: 	We investigate convergence properties of the proposed algorithms, and depict

28: 	their performance with several numerical experiments. \vspace{2mm}

29: \end{abstract}

30: