e176611a901fc5e2.tex
1: \begin{abstract}
2: 	Gaussian processes scale prohibitively with the size of the dataset. In response, many
3: 	approximation methods have been developed, which inevitably introduce approximation error. This
4: 	additional source of uncertainty, due to limited computation, is entirely ignored when using the
5: 	approximate posterior.
6: 	Therefore in practice, GP models are often as much about the approximation method as they are about
7: 	the data. Here, we develop a new class of methods that provides
8: 	consistent estimation of the combined uncertainty arising from \emph{both} the finite
9: 	number of data observed \emph{and} the finite amount of computation expended. The most
10: 	common GP approximations map to an instance in this class, such as
11: 	methods based on the Cholesky factorization, conjugate gradients, and inducing
12: 	points. For any method in this class, we prove (i) convergence of its posterior mean in
13: 	the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical
14: 	and computational covariances, and (iii) that the combined variance is a tight
15: 	worst-case bound for the squared error between the method's posterior mean and the latent
16: 	function. Finally, we
17: 	empirically demonstrate the consequences of ignoring computational uncertainty and show how
18: 	implicitly modeling it improves generalization performance on benchmark datasets.
19: \end{abstract}
20: