abstract:12702e028e1aadcd.tex

1: \begin{abstract}

2: This paper establishes risk convergence and

3: asymptotic weight matrix alignment

4: ---

5:   a form of implicit regularization

6: ---

7: of gradient flow and gradient descent when applied to deep linear networks

8: on linearly separable data.

9: In more detail, for gradient flow applied to strictly decreasing

10: loss functions (with similar results for gradient descent with

11: particular decreasing step sizes):

12: (i) the risk converges to $0$;

13: (ii) the normalized $i^\text{th}$ weight matrix asymptotically equals its

14: rank-$1$ approximation $u_iv_i^\top$;

15: (iii) these rank-$1$ matrices are

16:   aligned across layers, meaning $|v_{i+1}^\top u_i|\to1$.

17:   In the case of the logistic loss (binary cross entropy), more

18:   can be said: the linear function induced by the network ---

19:   the product of its weight matrices ---

20:   converges to the same direction as the maximum margin solution.

21:   This last property was identified in prior work,

22:   but only under assumptions on gradient descent which

23:   here are implied by the alignment phenomenon.

24: \end{abstract}

25: