abstract:e176611a901fc5e2.tex

1: \begin{abstract}

2: 	Gaussian processes scale prohibitively with the size of the dataset. In response, many

3: 	approximation methods have been developed, which inevitably introduce approximation error. This

4: 	additional source of uncertainty, due to limited computation, is entirely ignored when using the

5: 	approximate posterior.

6: 	Therefore in practice, GP models are often as much about the approximation method as they are about

7: 	the data. Here, we develop a new class of methods that provides

8: 	consistent estimation of the combined uncertainty arising from \emph{both} the finite

9: 	number of data observed \emph{and} the finite amount of computation expended. The most

10: 	common GP approximations map to an instance in this class, such as

11: 	methods based on the Cholesky factorization, conjugate gradients, and inducing

12: 	points. For any method in this class, we prove (i) convergence of its posterior mean in

13: 	the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical

14: 	and computational covariances, and (iii) that the combined variance is a tight

15: 	worst-case bound for the squared error between the method's posterior mean and the latent

16: 	function. Finally, we

17: 	empirically demonstrate the consequences of ignoring computational uncertainty and show how

18: 	implicitly modeling it improves generalization performance on benchmark datasets.

19: \end{abstract}

20: