abstract:d313043ee7aaf5bf.tex

1: \begin{abstract}

2: We prove a new generalization bound that shows for any class of linear predictors in Gaussian space,

3: the Rademacher complexity of the class and the training error under any continuous loss $\ell$ can control the test error under all $\emph{Moreau envelopes}$ of the loss $\ell$.

4: We use our finite-sample bound to directly recover the ``optimistic rate'' of \citet{optimistic-rates} for linear regression with the square loss, which is known to be tight for minimal $\ell_2$-norm interpolation, but we also handle more general settings where the label is generated by a potentially misspecified multi-index model. The same argument can analyze noisy interpolation of max-margin classifiers through the squared hinge loss, and establishes consistency results in spiked-covariance settings.

5: More generally, when the loss is only assumed to be Lipschitz, our bound effectively improves Talagrand’s well-known contraction lemma by a factor of two, and we prove uniform convergence of interpolators \citep{uc-interpolators} for all smooth, non-negative losses. Finally, we show that application of our generalization bound using localized Gaussian width will generally be sharp for empirical risk minimizers, establishing a non-asymptotic Moreau envelope theory for generalization that applies outside of proportional scaling regimes, handles model misspecification, and complements existing asymptotic Moreau envelope theories for M-estimation.

6: %

7:

8:

9: %

10: %

11:

12: \end{abstract}

13: