abstract:9ffc26a25fa74671.tex

1: \begin{abstract}

2: Scaling hyperparameter optimisation to very large datasets remains an open problem in the Gaussian process community.

3: This paper focuses on iterative methods, which use linear system solvers, like conjugate gradients, alternating projections or stochastic gradient descent, to construct an estimate of the marginal likelihood gradient.

4: We discuss three key improvements which are applicable across solvers:

5: (i) a pathwise gradient estimator, which reduces the required number of solver iterations and amortises the computational cost of making predictions,

6: (ii) warm starting linear system solvers with the solution from the previous step, which leads to faster solver convergence at the cost of negligible bias,

7: (iii) early stopping linear system solvers after a limited computational budget, which synergises with warm starting, allowing solver progress to accumulate over multiple marginal likelihood steps.

8: These techniques provide speed-ups of up to $72\times$ when solving to tolerance, and decrease the average residual norm by up to $7\times$ when stopping early.

9: \end{abstract}

10: