1: \begin{abstract}
2: Gaussian processes scale prohibitively with the size of the dataset. In response, many
3: approximation methods have been developed, which inevitably introduce approximation error. This
4: additional source of uncertainty, due to limited computation, is entirely ignored when using the
5: approximate posterior.
6: Therefore in practice, GP models are often as much about the approximation method as they are about
7: the data. Here, we develop a new class of methods that provides
8: consistent estimation of the combined uncertainty arising from \emph{both} the finite
9: number of data observed \emph{and} the finite amount of computation expended. The most
10: common GP approximations map to an instance in this class, such as
11: methods based on the Cholesky factorization, conjugate gradients, and inducing
12: points. For any method in this class, we prove (i) convergence of its posterior mean in
13: the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical
14: and computational covariances, and (iii) that the combined variance is a tight
15: worst-case bound for the squared error between the method's posterior mean and the latent
16: function. Finally, we
17: empirically demonstrate the consequences of ignoring computational uncertainty and show how
18: implicitly modeling it improves generalization performance on benchmark datasets.
19: \end{abstract}
20: