abstract:641b8a2cec3f2388.tex

1: \begin{abstract}

2: State-of-the-art optimization is steadily shifting towards massively parallel pipelines with extremely large batch sizes. As a consequence, CPU-bound preprocessing and disk/memory/network operations have emerged as new performance bottlenecks, as opposed to hardware-accelerated gradient computations. In this regime, a recently proposed approach is data echoing (Choi et al., 2019), which takes

3: repeated gradient steps on the same batch while waiting for fresh data to arrive from upstream. We provide the first convergence analyses of ``data-echoed'' extensions of common optimization methods, showing that they exhibit provable improvements over their synchronous counterparts.

4: Specifically, we show that in convex optimization with stochastic minibatches, data echoing affords speedups on the curvature-dominated part of the convergence rate, while maintaining the optimal statistical rate.

5: \end{abstract}

6: