06500c70c69e1bdd.tex
1: \begin{abstract}
2:   This paper shows that the implicit bias of gradient descent on linearly
3:   separable data is exactly characterized by the optimal solution of a dual
4:   optimization problem given by a smoothed margin, even for general losses.
5:   This is in contrast to prior results, which are often tailored to
6:   exponentially-tailed losses.
7: For the exponential loss specifically, with $n$ training examples and $t$
8:   gradient descent steps, our dual analysis further allows us to prove an
9:   $O\del{\ln(n)/\ln(t)}$ convergence rate to the $\ell_2$ maximum margin
10:   direction, when a constant step size is used.
11:   This rate is tight in both $n$ and $t$, which has not been presented by prior
12:   work.
13:   On the other hand, with a properly chosen but aggressive step size schedule,
14:   we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias,
15:   whereas prior work (including all first-order methods for the general hard-margin
16:   linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$
17:   margin rates to a suboptimal margin, with an implied (slower) bias rate.
18: Our key observations include that gradient descent on the primal variable
19:   naturally induces a mirror descent update on the dual variable, and that the
20:   dual objective in this setting is smooth enough to give a faster rate.
21: \end{abstract}
22: