abstract:2289c0fa3198085b.tex

1: \begin{abstract}%

2: We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the

3: direction of the max-margin (hard margin SVM) solution. The result also generalizes

4: to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting.  Furthermore, we show this convergence is very slow, and only

5: logarithmic in the convergence of the loss itself. This can help explain

6: the benefit of continuing to optimize the logistic or cross-entropy

7: loss even after the training error is zero and the training loss is

8: extremely small, and, as we show, even if the validation loss increases.

9: Our methodology can also aid in understanding implicit regularization

10: in more complex models and with other optimization methods.

11: \end{abstract}

12: