abstract:e3fac516bf88ca9e.tex

1: \begin{abstract}

2:  In this paper we provide a rigorous convergence analysis of a ``off''-policy temporal difference learning algorithm

3: with linear function approximation and per time-step linear computational complexity in ``online'' learning environment. The algorithm considered here is

4: TDC with importance weighting introduced by Maei et al. We support our theoretical results by

5: providing suitable empirical results for standard off-policy counterexamples.

6: \end{abstract}

7: