abstract:efe3f90de561b152.tex

1: \begin{abstract}

2:     Recent work across many machine learning disciplines

3: has

4:     highlighted that standard descent methods, even without explicit

5:     regularization, do not merely minimize the training error, but also

6:     exhibit an \emph{implicit bias}.

7: This bias is typically towards a certain regularized solution, and

8:     relies upon the details of the learning process,

9:     for instance the use of the cross-entropy loss.

10:

11:     In this work, we show that for empirical risk minimization over linear

12:     predictors with \emph{arbitrary} convex, strictly decreasing losses, if the

13:     risk does not attain its infimum, then the gradient-descent path and the

14:     \emph{algorithm-independent} regularization path converge to the same

15:     direction (whenever either converges to a direction).

16:     Using this result, we provide a justification for the widely-used

17:     exponentially-tailed losses (such as the exponential loss or the logistic

18:     loss):

19: while this convergence to a direction for exponentially-tailed losses is

20:     necessarily to the maximum-margin direction, other losses such as

21:     polynomially-tailed losses may induce convergence to a direction

22:     with a poor margin.

23: \end{abstract}

24: