1: \begin{abstract}
2: To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions.
3: %
4: We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks, obtaining exact expressions and bounds.
5: %
6: We establish connections between continual learning in the linear setting and two other research areas --
7: alternating projections and the Kaczmarz method.
8: %
9: In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas. %
10: In particular, when $T$ tasks in $d$ dimensions are presented cyclically for $k$ iterations, we prove an upper bound of $T^2\min\{1/\sqrt{k},d/k\}$ on the forgetting.
11: %
12: This stands in contrast to the convergence to the offline solution, which can be arbitrarily slow according to existing alternating projection results.
13: %
14: We further show that the $T^2$ factor can be lifted when tasks are presented in a random ordering.
15:
16:
17:
18: \end{abstract}
19: