e3fac516bf88ca9e.tex
1: \begin{abstract}
2:  In this paper we provide a rigorous convergence analysis of a ``off''-policy temporal difference learning algorithm 
3: with linear function approximation and per time-step linear computational complexity in ``online'' learning environment. The algorithm considered here is 
4: TDC with importance weighting introduced by Maei et al. We support our theoretical results by 
5: providing suitable empirical results for standard off-policy counterexamples.   
6: \end{abstract}
7: