1: \begin{abstract}
2: We consider the problem of unconstrained minimization of finite sums of functions.
3: We propose a simple, yet, practical way to incorporate variance reduction techniques into SignSGD, guaranteeing convergence that is similar to the full sign gradient descent.
4: The core idea is first instantiated on the problem of minimizing sums of convex and Lipschitz functions and is then extended to the smooth case via variance reduction.
5: Our analysis is elementary and much simpler than the typical proof for variance reduction methods.
6: We show that for smooth functions our method gives $\mathcal{O}(1 / \sqrt{T})$ rate for expected norm of the gradient and $\mathcal{O}(1/T)$ rate in the case of smooth convex functions, recovering convergence results of deterministic methods, while preserving computational advantages of SignSGD.
7: % Preliminary experimental results are provided to support theoretical claims
8: % \begin{enumerate}
9: % \item Write all for $\ell_p / \ell_q$ smoothness
10: % \item Discuss oracle calls, bits
11: % \item add simple experiments
12: % \item erase all meta stuff
13: % \item do not viebivat'sya
14: % \item discuss the absence of linear rates for strongy convex objectives
15: % \item discuss dimension dependence and our believes
16: % \item submit
17: % \end{enumerate}
18: \end{abstract}
19: