abstract:7cb5409491ccdced.tex

1: \begin{abstract}

2: The training dynamics of linear networks are well studied in two distinct

3: setups: the lazy regime and balanced/active regime, depending on the

4: initialization and width of the network. We provide a surprisingly

5: simple unyfing formula for the evolution of the learned matrix that

6: contains as special cases both lazy and balanced regimes but also

7: a mixed regime in between the two. In the mixed regime, a part of

8: the network is lazy while the other is balanced. More precisely the

9: network is lazy along singular values that are below a certain threshold

10: and balanced along those that are above the same threshold. At initialization,

11: all singular values are lazy, allowing for the network to align itself

12: with the task, so that later in time, when some of the singular value

13: cross the threshold and become active they will converge rapidly (convergence

14: in the balanced regime is notoriously difficult in the absence of

15: alignment). The mixed regime is the `best of both worlds': it converges

16: from any random initialization (in contrast to balanced dynamics which

17: require special initialization), and has a low rank bias (absent in

18: the lazy dynamics). This allows us to prove an almost complete phase

19: diagram of training behavior as a function of the variance at initialization

20: and the width, for a MSE training task.

21: \end{abstract}

22: