1: \begin{abstract}
2:
3: \Ac{FL} exploits the computation power of edge devices, typically mobile phones, while addressing privacy by letting data stay where it is produced.
4: FL has been used by major service providers to improve item recommendations, virtual keyboards and text auto-completion services.
5: While appealing, \FL performance is hampered by multiple factors:
6: {\begin{enumerate*}[label=\emph{(\roman*)}]
7: \item differing capabilities of participating clients (\eg, computing power, memory and network connectivity); %
8: \item strict training constraints where devices must be idle, plugged-in and connected to an unmetered WiFi; and %
9: \item data heterogeneity (a.k.a non-IIDness).
10: \end{enumerate*}}
11: Together, these lead to uneven participation, straggling, dropout and consequently slow down convergence, challenging the practicality of \FL for many applications.
12:
13: In this paper, we present \sys, the Guess and Learn algorithm, that significantly speeds up convergence by guessing model updates for each client.
14: The power of \sys is to effectively perform ``free'' learning steps without any additional gradient computations.
15: \sys provides these guesses through clever use of moments in the \adam optimizer in combination with the last computed gradient on clients.
16: Our extensive experimental study involving five standard \FL benchmarks shows that \sys speeds up the convergence up to $1.64\times$ in heterogeneous systems in the presence of data non-IIDness, saving tens of thousands of gradient computations.
17: \end{abstract}
18: