abstract:4b496501f478e113.tex

1: \begin{abstract}

2: Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. % do not exhibit convergent behavior.

3: To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. %loss dynamics are ergodic.

4: Our main contribution is to propose a notion of \textit{statistical algorithmic stability} (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives.

5: We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that ``train stably generalize better'' even when the training continues indefinitely and the weights do not converge. %; %without positing strong assumptions about the speed of convergence to a fixed point;

6: % 2) early stopping or convergence to a flat minimum is not necessary for generalization as long as the long-term behavior of the learning dynamics is stable. \al{This last point might be a bit controversial -- I would remove it unless we have very strong evidence} \sj{The flat minima thing has been contradicted also in other papers. So probably remove and and remove the `1'. We can mention 2) in the text}

7: \end{abstract}

8: