1: \begin{abstract}
2: We establish optimal convergence rates for a decomposition-based scalable
3: approach to kernel ridge regression. The method is simple to describe: it
4: randomly partitions a dataset of size $\totalobs$ into $\nummac$ subsets of
5: equal size, computes an independent kernel ridge regression estimator for
6: each subset, then averages the local solutions into a global predictor.
7: This partitioning leads to a substantial reduction in computation time
8: versus the standard approach of performing kernel ridge regression on all
9: $\totalobs$ samples. Our two main theorems establish that despite the
10: computational speed-up, statistical optimality is retained: as long as
11: $\nummac$ is not too large, the partition-based estimator achieves the
12: statistical minimax rate over all estimators using the set of $\totalobs$
13: samples. As concrete examples, our theory guarantees that the number of
14: processors $\nummac$ may grow nearly linearly for finite-rank kernels and
15: Gaussian kernels and polynomially in $\totalobs$ for Sobolev spaces, which
16: in turn allows for substantial reductions in computational cost. We conclude
17: with experiments on both simulated data and a music-prediction task that
18: complement our theoretical results, exhibiting the computational and
19: statistical benefits of our approach.
20: \end{abstract}
21: