1: \begin{abstract}
2: In this paper we provide a rigorous convergence analysis of a ``off''-policy temporal difference learning algorithm
3: with linear function approximation and per time-step linear computational complexity in ``online'' learning environment. The algorithm considered here is
4: TDC with importance weighting introduced by Maei et al. We support our theoretical results by
5: providing suitable empirical results for standard off-policy counterexamples.
6: \end{abstract}
7: