1: \begin{abstract}
2: Modern graphics hardware is designed for highly parallel numerical
3: tasks and promises significant cost and performance benefits for
4: many scientific applications. One such application is lattice
5: quantum chromodyamics (lattice QCD), where the main computational
6: challenge is to efficiently solve the discretized Dirac equation in
7: the presence of an $SU(3)$ gauge field. Using \nvidia's CUDA
8: platform we have implemented a Wilson-Dirac sparse matrix-vector
9: product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops
10: for double, single and half precision respectively on \nvidia's
11: GeForce GTX 280 GPU. We have developed a new mixed precision
12: approach for Krylov solvers using {\it reliable updates} which
13: allows for full double precision accuracy while using only single or
14: half precision arithmetic for the bulk of the computation. The
15: resulting BiCGstab and CG solvers run in excess of 100 Gflops and,
16: in terms of iterations until convergence, perform better than the
17: usual defect-correction approach for mixed precision.
18: \end{abstract}
19: