7029df31237779c1.tex
1: \begin{abstract}
2: We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters.
3: In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server.
4: The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions.
5: We estimate the relative condition number for linear prediction models by studying \emph{uniform} concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method.
6: Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime.
7: \end{abstract}
8: