abstract:7029df31237779c1.tex

1: \begin{abstract}

2: We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters.

3: In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server.

4: The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions.

5: We estimate the relative condition number for linear prediction models by studying \emph{uniform} concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method.

6: Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime.

7: \end{abstract}

8: