1: \begin{abstract}
2: We analyze the complexity of biased stochastic gradient methods (SGD), where individual updates are corrupted by deterministic, i.e.\ \emph{biased} error terms.
3: We derive convergence results for smooth (non-convex) functions and give improved rates under the Polyak-\L{}ojasiewicz condition.
4: We quantify how the magnitude of the bias impacts the attainable accuracy and the convergence rates (sometimes leading to divergence).
5:
6: Our framework covers many applications where either only biased gradient updates are available, or preferred, over unbiased ones for performance reasons.
7: For instance, in the domain of distributed learning, biased gradient compression techniques such as top-$k$ compression have been proposed as a tool to alleviate the communication bottleneck and in derivative-free optimization, only biased gradient estimators can be queried.
8: We discuss a few guiding examples that show the broad applicability of our analysis.
9: \end{abstract}
10: