344b0d05f3d25d03.tex
1: \begin{abstract}
2: To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions.
3: %
4: We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks, obtaining exact expressions and bounds.
5: %
6: We establish connections between continual learning in the linear setting and two other research areas -- 
7: alternating projections and the Kaczmarz method.
8: %
9: In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas. %
10: In particular, when $T$ tasks in $d$ dimensions are presented cyclically for $k$ iterations, we prove an upper bound of $T^2\min\{1/\sqrt{k},d/k\}$ on the forgetting.
11: % 
12: This stands in contrast to the convergence to the offline solution, which can be arbitrarily slow according to existing alternating projection results.
13: %
14: We further show that the $T^2$ factor can be lifted when tasks are presented in a random ordering. 
15: 
16: 
17: 
18: \end{abstract}
19: