abstract:8dc6f29724c2e4ea.tex

1: \begin{abstract}

2:

3: Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks.

4: In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow.

5: We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow.

6: Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias.

7: To fully complete our analysis, we provide convergence guarantees for the dynamics.

8: We also give experimental results which support our theoretical claims.

9: Our findings highlight the fact that structured noise can induce better generalisation and

10: they help explain the greater performances of stochastic gradient  descent over gradient descent observed in practice.

11:

12: \end{abstract}

13: