abstract:344b0d05f3d25d03.tex

1: \begin{abstract}

2: To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions.

3: %

4: We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks, obtaining exact expressions and bounds.

5: %

6: We establish connections between continual learning in the linear setting and two other research areas --

7: alternating projections and the Kaczmarz method.

8: %

9: In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas. %

10: In particular, when $T$ tasks in $d$ dimensions are presented cyclically for $k$ iterations, we prove an upper bound of $T^2\min\{1/\sqrt{k},d/k\}$ on the forgetting.

11: %

12: This stands in contrast to the convergence to the offline solution, which can be arbitrarily slow according to existing alternating projection results.

13: %

14: We further show that the $T^2$ factor can be lifted when tasks are presented in a random ordering.

15:

16:

17:

18: \end{abstract}

19: