abstract:51b954b24f020a03.tex

1: \begin{abstract}

2: The standard implementation of the conjugate gradient algorithm suffers from communication bottlenecks on parallel architectures, due primarily to the two global reductions required every iteration.

3: %

4: In this paper, we study conjugate gradient variants which decrease the runtime per iteration by overlapping global synchronizations, and in the case of pipelined variants, matrix-vector products.

5: %

6: Through the use of a predict-and-recompute scheme, whereby recursively-updated quantities are first used as a predictor for their true values and then recomputed exactly at a later point in the iteration, these variants are observed to have convergence behavior nearly as good as the standard conjugate gradient implementation on a variety of test problems.

7: %

8: We provide a rounding error analysis which provides insight into this observation.

9: %

10: It is also verified experimentally that the variants studied do indeed reduce the runtime per iteration in practice and that they scale similarly to previously-studied communication-hiding variants.

11: %

12: Finally, because these variants achieve good convergence without the use of any additional input parameters, they have the potential to be used in place of the standard conjugate gradient implementation in a range of applications.

13: \end{abstract}

14: