958f06dd90d20c8c.tex
1: \begin{abstract}
2: 	Gaussian process hyperparameter optimization requires linear solves
3: 	with, and $\log$-determinants of, large kernel matrices.
4: 	Iterative numerical techniques are becoming popular to scale to larger datasets,
5: 	relying on the conjugate gradient method (CG) for the linear solves
6: 	and stochastic trace estimation for the $\log$-determinant.
7: 	This work introduces new algorithmic and theoretical insights for preconditioning these
8: 	computations.
9: 	While preconditioning is well understood in the context of CG,
10: 	we demonstrate that it can also accelerate convergence and reduce variance of the
11: 	estimates for the $\log$-determinant and its derivative.
12: 	We prove general probabilistic
13: 	error bounds for the preconditioned computation of the $\log$-determinant,
14: 	$\log$-marginal likelihood and its derivatives. Additionally, we derive specific
15: 	rates for a range of kernel-preconditioner combinations, showing that up to
16: 	exponential convergence can be achieved. Our theoretical results enable provably
17: 	efficient optimization of kernel hyperparameters, which we validate empirically on
18: 	large-scale benchmark problems. There our approach accelerates training by up to an order of
19: 	magnitude.
20: \end{abstract}
21: