abstract:b4ab6b7dafaa791a.tex

1: \begin{abstract}

2:   Modern graphics hardware is designed for highly parallel numerical

3:   tasks and promises significant cost and performance benefits for

4:   many scientific applications.  One such application is lattice

5:   quantum chromodyamics (lattice QCD), where the main computational

6:   challenge is to efficiently solve the discretized Dirac equation in

7:   the presence of an $SU(3)$ gauge field.  Using \nvidia's CUDA

8:   platform we have implemented a Wilson-Dirac sparse matrix-vector

9:   product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops

10:   for double, single and half precision respectively on \nvidia's

11:   GeForce GTX 280 GPU.  We have developed a new mixed precision

12:   approach for Krylov solvers using {\it reliable updates} which

13:   allows for full double precision accuracy while using only single or

14:   half precision arithmetic for the bulk of the computation.  The

15:   resulting BiCGstab and CG solvers run in excess of 100 Gflops and,

16:   in terms of iterations until convergence, perform better than the

17:   usual defect-correction approach for mixed precision.

18: \end{abstract}

19: