9ffc26a25fa74671.tex
1: \begin{abstract}
2: Scaling hyperparameter optimisation to very large datasets remains an open problem in the Gaussian process community.
3: This paper focuses on iterative methods, which use linear system solvers, like conjugate gradients, alternating projections or stochastic gradient descent, to construct an estimate of the marginal likelihood gradient.
4: We discuss three key improvements which are applicable across solvers:
5: (i) a pathwise gradient estimator, which reduces the required number of solver iterations and amortises the computational cost of making predictions,
6: (ii) warm starting linear system solvers with the solution from the previous step, which leads to faster solver convergence at the cost of negligible bias,
7: (iii) early stopping linear system solvers after a limited computational budget, which synergises with warm starting, allowing solver progress to accumulate over multiple marginal likelihood steps.
8: These techniques provide speed-ups of up to $72\times$ when solving to tolerance, and decrease the average residual norm by up to $7\times$ when stopping early.
9: \end{abstract}
10: