abstract:35d286c7c6d9a1a2.tex

1: \begin{abstract}

2: \bigskip

3:

4: In this article, we derive concentration inequalities for the

5: cross-validation estimate of the generalization error for

6: empirical risk minimizers. In the general setting, we prove

7: sanity-check bounds in the spirit of \cite{KR99}

8: \textquotedblleft\textit{bounds showing that the worst-case error

9: of this estimate is not much worse that of training error

10: estimate} \textquotedblright . General loss functions and class of

11: predictors with finite VC-dimension are considered. We closely follow the formalism introduced by \cite{DUD03} to cover a

12: large variety of

13: cross-validation procedures including leave-one-out cross-validation, $k$%

14: -fold cross-validation, hold-out cross-validation (or split sample), and the

15: leave-$\upsilon$-out cross-validation.

16:

17: \bigskip

18:

19: \noindent  In particular, we focus on proving the consistency of

20: the various cross-validation procedures. We point out the

21: interest of each cross-validation procedure in terms of rate of

22: convergence. An estimation curve with transition phases depending

23: on the cross-validation procedure and not only on the percentage

24: of observations in the test sample gives a simple rule on how to

25: choose the cross-validation. An interesting consequence is that

26: the size of the test sample is not required to grow to infinity

27: for the consistency of the cross-validation procedure.

28:

29: \bigskip

30:

31: \begin{keywords}%

32: \noindent  Keywords : Cross-validation, generalization error,

33: concentration inequality, optimal splitting, resampling.

34: \end{keywords}

35: \end{abstract}

36: