18bc1eae9fe018d2.tex
1: \begin{abstract}
2: \noindent We propose a novel sparse spectrum approximation of Gaussian
3: process (GP) tailored for Bayesian optimization. Whilst the current
4: sparse spectrum methods provide desired approximations for regression
5: problems, it is observed that this particular form of sparse approximations
6: generates an overconfident GP, i.e. it produces less epistemic uncertainty
7: than the original GP. Since the balance between predictive mean and
8: the predictive variance is the key determinant to the success of Bayesian
9: optimization, the current sparse spectrum methods are less suitable
10: for it. We derive a new regularized marginal likelihood for finding
11: the optimal frequencies to fix this over-confidence issue, particularly
12: for Bayesian optimization. The regularizer trades off the accuracy
13: in the model fitting with targeted increase in the predictive variance
14: of the resultant GP. Specifically, we use the entropy of the global
15: maximum distribution from the posterior GP as the regularizer that
16: needs to be maximized. Since this distribution cannot be calculated
17: analytically, we first propose a Thompson sampling based approach
18: and then a more efficient sequential Monte Carlo based approach to
19: estimate it. Later, we also show that the Expected Improvement acquisition
20: function can be used as a proxy for the maximum distribution, thus
21: making the whole process further efficient. Experiments show considerable
22: improvement to Bayesian optimization convergence rate over the vanilla
23: sparse spectrum method and over a full GP when its covariance matrix
24: is ill-conditioned due to the presence of large number of observations.
25: \end{abstract}