abstract:0ab60b9de9a2d993.tex

1: \begin{abstract}

2:   We establish optimal convergence rates for a decomposition-based scalable

3:   approach to kernel ridge regression.  The method is simple to describe: it

4:   randomly partitions a dataset of size $\totalobs$ into $\nummac$ subsets of

5:   equal size, computes an independent kernel ridge regression estimator for

6:   each subset, then averages the local solutions into a global predictor.

7:   This partitioning leads to a substantial reduction in computation time

8:   versus the standard approach of performing kernel ridge regression on all

9:   $\totalobs$ samples.  Our two main theorems establish that despite the

10:   computational speed-up, statistical optimality is retained: as long as

11:   $\nummac$ is not too large, the partition-based estimator achieves the

12:   statistical minimax rate over all estimators using the set of $\totalobs$

13:   samples.  As concrete examples, our theory guarantees that the number of

14:   processors $\nummac$ may grow nearly linearly for finite-rank kernels and

15:   Gaussian kernels and polynomially in $\totalobs$ for Sobolev spaces, which

16:   in turn allows for substantial reductions in computational cost. We conclude

17:   with experiments on both simulated data and a music-prediction task that

18:   complement our theoretical results, exhibiting the computational and

19:   statistical benefits of our approach.

20: \end{abstract}

21: