1: \begin{abstract}
2: In this paper, we study the Temporal Difference (TD) learning with linear
3: value function approximation.
4: %
5: It is well known that most TD learning algorithms are unstable with
6: linear function approximation and off-policy learning.
7: %
8: Recent development of \emph{Gradient TD} (GTD)
9: algorithms has addressed this problem successfully.
10: %
11: However, the success of GTD algorithms requires a set of
12: well chosen features, which are not always available.
13: %
14: When the number of features is huge, the GTD algorithms might
15: face the problem of overfitting and being computationally expensive.
16: %
17: To cope with this difficulty, regularization techniques, in
18: particular $\ell_{1}$ regularization, have attracted significant attentions
19: in developing TD learning algorithms.
20: %
21: The present work combines the GTD algorithms with $\ell_{1}$ regularization.
22: % , which is known to be a simple effective mechanism for automatic feature selection.
23: %
24: We propose a family of $\ell_{1}$ regularized GTD algorithms, which employ
25: the well known soft thresholding operator.
26: %
27: We investigate convergence properties of the proposed algorithms, and depict
28: their performance with several numerical experiments. \vspace{2mm}
29: \end{abstract}
30: