1: \begin{abstract}
2: Gaussian process hyperparameter optimization requires linear solves
3: with, and $\log$-determinants of, large kernel matrices.
4: Iterative numerical techniques are becoming popular to scale to larger datasets,
5: relying on the conjugate gradient method (CG) for the linear solves
6: and stochastic trace estimation for the $\log$-determinant.
7: This work introduces new algorithmic and theoretical insights for preconditioning these
8: computations.
9: While preconditioning is well understood in the context of CG,
10: we demonstrate that it can also accelerate convergence and reduce variance of the
11: estimates for the $\log$-determinant and its derivative.
12: We prove general probabilistic
13: error bounds for the preconditioned computation of the $\log$-determinant,
14: $\log$-marginal likelihood and its derivatives. Additionally, we derive specific
15: rates for a range of kernel-preconditioner combinations, showing that up to
16: exponential convergence can be achieved. Our theoretical results enable provably
17: efficient optimization of kernel hyperparameters, which we validate empirically on
18: large-scale benchmark problems. There our approach accelerates training by up to an order of
19: magnitude.
20: \end{abstract}
21: