1: \begin{abstract}
2: This paper establishes risk convergence and
3: asymptotic weight matrix alignment
4: ---
5: a form of implicit regularization
6: ---
7: of gradient flow and gradient descent when applied to deep linear networks
8: on linearly separable data.
9: In more detail, for gradient flow applied to strictly decreasing
10: loss functions (with similar results for gradient descent with
11: particular decreasing step sizes):
12: (i) the risk converges to $0$;
13: (ii) the normalized $i^\text{th}$ weight matrix asymptotically equals its
14: rank-$1$ approximation $u_iv_i^\top$;
15: (iii) these rank-$1$ matrices are
16: aligned across layers, meaning $|v_{i+1}^\top u_i|\to1$.
17: In the case of the logistic loss (binary cross entropy), more
18: can be said: the linear function induced by the network ---
19: the product of its weight matrices ---
20: converges to the same direction as the maximum margin solution.
21: This last property was identified in prior work,
22: but only under assumptions on gradient descent which
23: here are implied by the alignment phenomenon.
24: \end{abstract}
25: