7cb5409491ccdced.tex
1: \begin{abstract}
2: The training dynamics of linear networks are well studied in two distinct
3: setups: the lazy regime and balanced/active regime, depending on the
4: initialization and width of the network. We provide a surprisingly
5: simple unyfing formula for the evolution of the learned matrix that
6: contains as special cases both lazy and balanced regimes but also
7: a mixed regime in between the two. In the mixed regime, a part of
8: the network is lazy while the other is balanced. More precisely the
9: network is lazy along singular values that are below a certain threshold
10: and balanced along those that are above the same threshold. At initialization,
11: all singular values are lazy, allowing for the network to align itself
12: with the task, so that later in time, when some of the singular value
13: cross the threshold and become active they will converge rapidly (convergence
14: in the balanced regime is notoriously difficult in the absence of
15: alignment). The mixed regime is the `best of both worlds': it converges
16: from any random initialization (in contrast to balanced dynamics which
17: require special initialization), and has a low rank bias (absent in
18: the lazy dynamics). This allows us to prove an almost complete phase
19: diagram of training behavior as a function of the variance at initialization
20: and the width, for a MSE training task.
21: \end{abstract}
22: