8dc6f29724c2e4ea.tex
1: \begin{abstract}
2: 
3: Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. 
4: In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. 
5: We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow.
6: Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias. 
7: To fully complete our analysis, we provide convergence guarantees for the dynamics. 
8: We also give experimental results which support our theoretical claims. 
9: Our findings highlight the fact that structured noise can induce better generalisation and 
10: they help explain the greater performances of stochastic gradient  descent over gradient descent observed in practice.
11: 
12: \end{abstract}
13: