abstract:74a06986db80739f.tex

1: \begin{abstract}

2: For a given distribution, learning algorithm, and performance metric, the \emph{rate}

3: of convergence (or \emph{data-scaling law}) is the asymptotic behavior of the algorithm's test performance

4: as a function of number of train samples.

5: Many learning methods in both theory and practice have power-law rates, i.e. performance scales as $n^{-\alpha}$ for some $\alpha > 0$.

6: Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest.

7:

8: We observe the existence of a ``universal learner,'' which achieves the best possible distribution-dependent

9: asymptotic rate among all learning algorithms within a specified runtime (e.g. $\cO(n^2)$), while incurring only polylogarithmic slowdown over this runtime.

10: This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions.

11:

12: The construction itself is a simple extension of Levin's universal search \citep{levin-search}.

13: And much like universal search, the universal learner is not at all practical,

14: and is primarily of theoretical and philosophical interest.

15: \end{abstract}

16: