1: \begin{abstract}
2: Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on recent trends in machine learning hardware.
3: In this paper, we present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM).
4: BBMM inference uses a modified \emph{batched} version of the conjugate gradients algorithm to derive all terms required for training and inference in a single call.
5: Adapting this algorithm to complex models simply requires a routine for efficient matrix-matrix multiplication with the kernel and its derivative.
6: In addition, BBMM utilizes a specialized preconditioner that substantially speeds up convergence.
7: In experiments, we show that BBMM efficiently utilizes GPU hardware, speeding up exact GP inference and many scalable approximations by up to $20$ times that of existing methods.
8: \end{abstract}
9: