abstract:958f06dd90d20c8c.tex

1: \begin{abstract}

2: 	Gaussian process hyperparameter optimization requires linear solves

3: 	with, and $\log$-determinants of, large kernel matrices.

4: 	Iterative numerical techniques are becoming popular to scale to larger datasets,

5: 	relying on the conjugate gradient method (CG) for the linear solves

6: 	and stochastic trace estimation for the $\log$-determinant.

7: 	This work introduces new algorithmic and theoretical insights for preconditioning these

8: 	computations.

9: 	While preconditioning is well understood in the context of CG,

10: 	we demonstrate that it can also accelerate convergence and reduce variance of the

11: 	estimates for the $\log$-determinant and its derivative.

12: 	We prove general probabilistic

13: 	error bounds for the preconditioned computation of the $\log$-determinant,

14: 	$\log$-marginal likelihood and its derivatives. Additionally, we derive specific

15: 	rates for a range of kernel-preconditioner combinations, showing that up to

16: 	exponential convergence can be achieved. Our theoretical results enable provably

17: 	efficient optimization of kernel hyperparameters, which we validate empirically on

18: 	large-scale benchmark problems. There our approach accelerates training by up to an order of

19: 	magnitude.

20: \end{abstract}

21: