abstract:199c5c775c0a42b3.tex

1: \begin{abstract}

2: We analyze the complexity of biased stochastic gradient methods (SGD), where individual updates are corrupted by deterministic, i.e.\ \emph{biased} error terms.

3: We derive convergence results for smooth (non-convex) functions and give improved rates under the Polyak-\L{}ojasiewicz condition.

4: We quantify how the magnitude of the bias impacts the attainable accuracy  and the convergence rates (sometimes leading to divergence).

5:

6: Our framework covers many applications where either only biased gradient updates are available, or preferred, over unbiased ones for performance reasons.

7: For instance, in the domain of distributed learning, biased gradient compression techniques such as top-$k$ compression have been proposed as a tool to alleviate the communication bottleneck and in derivative-free optimization, only biased gradient estimators can be queried.

8:  We discuss a few guiding examples that show the broad applicability of our analysis.

9: \end{abstract}

10: