1: \begin{abstract}
2: We consider the learning task consisting in predicting as well as the
3: best function in a finite reference set $\G$ up to the smallest possible additive term.
4: If $R(g)$ denotes the generalization error of a prediction function $g$,
5: under reasonable assumptions on the loss function
6: (typically satisfied by the least square loss when the output is bounded), it is known that
7: the progressive mixture rule $\hg$ satisfies
8: \begarlab{eq:1}
9: \E R(\hg) \le \undc{\min}{g\in\G} R(g) + C \frac{\log|\G|}{n}, %\log|\G|
10: \endarlab
11: where $n$ denotes the size of the training set, $\E$ denotes the expectation $\wrt$ the
12: training set distribution and $C$ denotes a positive constant.
13:
14: This work mainly shows that for any training set size $n$,
15: there exist $\eps>0$, a reference set $\G$ and a probability distribution generating the data
16: such that with probability at least $\eps$
17: \begar
18: R(\hg) \ge \undc{\min}{g\in\G} R(g) + c \sqrt{\frac{\log(|\G|\eps^{-1})}{n}},
19: \endar
20: where c is a positive constant.
21: In other words, surprisingly, for appropriate reference set $\G$, the deviation convergence rate of the progressive mixture rule is
22: only of order $1/\sqrt{n}$ while
23: its expectation convergence rate is of order $1/n$. The same conclusion
24: holds for the progressive indirect mixture rule.
25: This work also emphasizes on the suboptimality of algorithms based on penalized empirical risk minimization on $\G$.
26: %the algorithm proposed in see \cite[Section 4]{Aud06a}.
27: \end{abstract}
28: