1: \begin{abstract}%
2: We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the
3: direction of the max-margin (hard margin SVM) solution. The result also generalizes
4: to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only
5: logarithmic in the convergence of the loss itself. This can help explain
6: the benefit of continuing to optimize the logistic or cross-entropy
7: loss even after the training error is zero and the training loss is
8: extremely small, and, as we show, even if the validation loss increases.
9: Our methodology can also aid in understanding implicit regularization
10: in more complex models and with other optimization methods.
11: \end{abstract}
12: