abstract:b7890535e98f04f9.tex

1: \begin{abstract}

2:   Temporal difference (TD) learning with linear function approximation, abbreviated as linear TD, is a classic and powerful prediction algorithm in reinforcement learning.

3:   While it is well understood that linear TD converges almost surely to a unique point, this convergence traditionally requires the assumption that the features used by the approximator are linearly independent.

4:   However, this linear independence assumption does not hold in many practical scenarios. This work is the first to establish the almost sure convergence of linear TD without requiring linearly independent features.

5:   In fact, we do not make any assumptions on the features. We prove that the approximated value function converges to a unique point and the weight iterates converge to a set.

6:   We also establish a notion of local stability of the weight iterates.

7:   Importantly, we do not need to introduce any other additional assumptions and do not need to make any modification to the linear TD algorithm. Key to our analysis is a novel characterization of bounded invariant sets of the mean ODE of linear TD.

8: \end{abstract}

9: