0ab60b9de9a2d993.tex
1: \begin{abstract}
2:   We establish optimal convergence rates for a decomposition-based scalable
3:   approach to kernel ridge regression.  The method is simple to describe: it
4:   randomly partitions a dataset of size $\totalobs$ into $\nummac$ subsets of
5:   equal size, computes an independent kernel ridge regression estimator for
6:   each subset, then averages the local solutions into a global predictor.
7:   This partitioning leads to a substantial reduction in computation time
8:   versus the standard approach of performing kernel ridge regression on all
9:   $\totalobs$ samples.  Our two main theorems establish that despite the
10:   computational speed-up, statistical optimality is retained: as long as
11:   $\nummac$ is not too large, the partition-based estimator achieves the
12:   statistical minimax rate over all estimators using the set of $\totalobs$
13:   samples.  As concrete examples, our theory guarantees that the number of
14:   processors $\nummac$ may grow nearly linearly for finite-rank kernels and
15:   Gaussian kernels and polynomially in $\totalobs$ for Sobolev spaces, which
16:   in turn allows for substantial reductions in computational cost. We conclude
17:   with experiments on both simulated data and a music-prediction task that
18:   complement our theoretical results, exhibiting the computational and
19:   statistical benefits of our approach.
20: \end{abstract}
21: