51b954b24f020a03.tex
1: \begin{abstract}
2: The standard implementation of the conjugate gradient algorithm suffers from communication bottlenecks on parallel architectures, due primarily to the two global reductions required every iteration.
3: %
4: In this paper, we study conjugate gradient variants which decrease the runtime per iteration by overlapping global synchronizations, and in the case of pipelined variants, matrix-vector products.
5: %
6: Through the use of a predict-and-recompute scheme, whereby recursively-updated quantities are first used as a predictor for their true values and then recomputed exactly at a later point in the iteration, these variants are observed to have convergence behavior nearly as good as the standard conjugate gradient implementation on a variety of test problems.
7: %
8: We provide a rounding error analysis which provides insight into this observation.
9: %
10: It is also verified experimentally that the variants studied do indeed reduce the runtime per iteration in practice and that they scale similarly to previously-studied communication-hiding variants.
11: %
12: Finally, because these variants achieve good convergence without the use of any additional input parameters, they have the potential to be used in place of the standard conjugate gradient implementation in a range of applications.
13: \end{abstract}
14: