d5332a86194923c8.tex
1: \begin{abstract}
2: We show that running gradient descent with variable learning rate guarantees loss 
3: $f(\xx) \leq 1.1 \cdot f(\xx^*) + \epsilon$
4: for the logistic regression objective, 
5: where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of 
6: an arbitrary fixed solution $\xx^*$. 
7: This is in contrast to the common intuition that the absence of strong convexity precludes linear 
8: convergence of first-order methods, and highlights the importance of variable learning rates
9: for gradient descent. 
10: We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.
11: \end{abstract}
12: