4600967c21756930.tex
1: \begin{abstract}
2: Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution $\Prob$ and returns a hypothesis $f$ chosen from a fixed class $\F$ with small loss $\lossfunc$. 
3: In the parametric setting, depending upon $(\lossfunc, \F,\Prob)$ ERM can have slow $(1/\sqrt{n})$ or fast $(1/n)$ rates of convergence of the excess risk as a function of the sample size $n$. There exist several results that give sufficient conditions for fast rates in terms of joint properties of $\lossfunc$, $\F$, and $\Prob$, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss $\lossfunc$ (there being no role there for $\F$ or $\Prob$). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of $(\lossfunc,\F, \Prob)$, and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the general moment problem.  We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.
4: \end{abstract}
5: