abstract:06500c70c69e1bdd.tex

1: \begin{abstract}

2:   This paper shows that the implicit bias of gradient descent on linearly

3:   separable data is exactly characterized by the optimal solution of a dual

4:   optimization problem given by a smoothed margin, even for general losses.

5:   This is in contrast to prior results, which are often tailored to

6:   exponentially-tailed losses.

7: For the exponential loss specifically, with $n$ training examples and $t$

8:   gradient descent steps, our dual analysis further allows us to prove an

9:   $O\del{\ln(n)/\ln(t)}$ convergence rate to the $\ell_2$ maximum margin

10:   direction, when a constant step size is used.

11:   This rate is tight in both $n$ and $t$, which has not been presented by prior

12:   work.

13:   On the other hand, with a properly chosen but aggressive step size schedule,

14:   we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias,

15:   whereas prior work (including all first-order methods for the general hard-margin

16:   linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$

17:   margin rates to a suboptimal margin, with an implied (slower) bias rate.

18: Our key observations include that gradient descent on the primal variable

19:   naturally induces a mirror descent update on the dual variable, and that the

20:   dual objective in this setting is smooth enough to give a faster rate.

21: \end{abstract}

22: