1: \begin{abstract}
2: This paper shows that the implicit bias of gradient descent on linearly
3: separable data is exactly characterized by the optimal solution of a dual
4: optimization problem given by a smoothed margin, even for general losses.
5: This is in contrast to prior results, which are often tailored to
6: exponentially-tailed losses.
7: For the exponential loss specifically, with $n$ training examples and $t$
8: gradient descent steps, our dual analysis further allows us to prove an
9: $O\del{\ln(n)/\ln(t)}$ convergence rate to the $\ell_2$ maximum margin
10: direction, when a constant step size is used.
11: This rate is tight in both $n$ and $t$, which has not been presented by prior
12: work.
13: On the other hand, with a properly chosen but aggressive step size schedule,
14: we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias,
15: whereas prior work (including all first-order methods for the general hard-margin
16: linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$
17: margin rates to a suboptimal margin, with an implied (slower) bias rate.
18: Our key observations include that gradient descent on the primal variable
19: naturally induces a mirror descent update on the dual variable, and that the
20: dual objective in this setting is smooth enough to give a faster rate.
21: \end{abstract}
22: