1: \begin{abstract}
2: For a given distribution, learning algorithm, and performance metric, the \emph{rate}
3: of convergence (or \emph{data-scaling law}) is the asymptotic behavior of the algorithm's test performance
4: as a function of number of train samples.
5: Many learning methods in both theory and practice have power-law rates, i.e. performance scales as $n^{-\alpha}$ for some $\alpha > 0$.
6: Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest.
7:
8: We observe the existence of a ``universal learner,'' which achieves the best possible distribution-dependent
9: asymptotic rate among all learning algorithms within a specified runtime (e.g. $\cO(n^2)$), while incurring only polylogarithmic slowdown over this runtime.
10: This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions.
11:
12: The construction itself is a simple extension of Levin's universal search \citep{levin-search}.
13: And much like universal search, the universal learner is not at all practical,
14: and is primarily of theoretical and philosophical interest.
15: \end{abstract}
16: