abstract:65b9703a456c7d38.tex

1: \begin{abstract}

2: We consider the problem of unconstrained minimization of finite sums of functions.

3: We propose a simple, yet, practical way to incorporate variance reduction techniques into SignSGD, guaranteeing convergence that is similar to the full sign gradient descent.

4: The core idea is first instantiated on the problem of minimizing sums of convex and Lipschitz functions and is then extended to the smooth case via variance reduction.

5: Our analysis is elementary and much simpler than the typical proof for variance reduction methods.

6: We show that for smooth functions our method gives $\mathcal{O}(1 / \sqrt{T})$ rate for expected norm of the gradient and $\mathcal{O}(1/T)$ rate in the case of smooth convex functions, recovering convergence results of deterministic methods, while preserving computational advantages of SignSGD.

7: % Preliminary experimental results are provided to support theoretical claims

8: % \begin{enumerate}

9: %     \item Write all for $\ell_p / \ell_q$ smoothness

10: %     \item Discuss oracle calls, bits

11: %     \item add simple experiments

12: %     \item erase all meta stuff

13: %     \item do not viebivat'sya

14: %     \item discuss the absence of linear rates for strongy convex objectives

15: %     \item discuss dimension dependence and our believes

16: %     \item submit

17: % \end{enumerate}

18: \end{abstract}

19: