abstract:dfce9274bd3dfdcd.tex

1: \begin{abstract}

2: We consider the learning task consisting in predicting as well as the

3: best function in a finite reference set $\G$ up to the smallest possible additive term.

4: If $R(g)$ denotes the generalization error of a prediction function $g$,

5: under reasonable assumptions on the loss function

6: (typically satisfied by the least square loss when the output is bounded), it is known that

7: the progressive mixture rule $\hg$ satisfies

8: 	\begarlab{eq:1}

9: 	\E R(\hg) \le \undc{\min}{g\in\G} R(g) + C \frac{\log|\G|}{n}, %\log|\G|

10: 	\endarlab

11: where $n$ denotes the size of the training set, $\E$ denotes the expectation $\wrt$ the

12: training set distribution and $C$ denotes a positive constant.

13:

14: This work mainly shows that for any training set size $n$,

15: there exist $\eps>0$, a reference set $\G$ and a probability distribution generating the data

16: such that with probability at least $\eps$

17: 	\begar

18: 	R(\hg) \ge \undc{\min}{g\in\G} R(g) + c \sqrt{\frac{\log(|\G|\eps^{-1})}{n}},

19: 	\endar

20: where c is a positive constant.

21: In other words, surprisingly, for appropriate reference set $\G$, the deviation convergence rate of the progressive mixture rule is

22: only of order $1/\sqrt{n}$ while

23: its expectation convergence rate is of order $1/n$. The same conclusion

24: holds for the progressive indirect mixture rule.

25: This work also emphasizes on the suboptimality of algorithms based on penalized empirical risk minimization on $\G$.

26: %the algorithm proposed in see \cite[Section 4]{Aud06a}.

27: \end{abstract}

28: