efe3f90de561b152.tex
1: \begin{abstract}
2:     Recent work across many machine learning disciplines
3: has
4:     highlighted that standard descent methods, even without explicit
5:     regularization, do not merely minimize the training error, but also
6:     exhibit an \emph{implicit bias}.
7: This bias is typically towards a certain regularized solution, and
8:     relies upon the details of the learning process,
9:     for instance the use of the cross-entropy loss.
10: 
11:     In this work, we show that for empirical risk minimization over linear
12:     predictors with \emph{arbitrary} convex, strictly decreasing losses, if the
13:     risk does not attain its infimum, then the gradient-descent path and the
14:     \emph{algorithm-independent} regularization path converge to the same
15:     direction (whenever either converges to a direction).
16:     Using this result, we provide a justification for the widely-used
17:     exponentially-tailed losses (such as the exponential loss or the logistic
18:     loss):
19: while this convergence to a direction for exponentially-tailed losses is
20:     necessarily to the maximum-margin direction, other losses such as
21:     polynomially-tailed losses may induce convergence to a direction
22:     with a poor margin.
23: \end{abstract}
24: