1: \begin{abstract}
2:
3: Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks.
4: In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow.
5: We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow.
6: Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias.
7: To fully complete our analysis, we provide convergence guarantees for the dynamics.
8: We also give experimental results which support our theoretical claims.
9: Our findings highlight the fact that structured noise can induce better generalisation and
10: they help explain the greater performances of stochastic gradient descent over gradient descent observed in practice.
11:
12: \end{abstract}
13: