35d286c7c6d9a1a2.tex
1: \begin{abstract}
2: \bigskip
3: 
4: In this article, we derive concentration inequalities for the
5: cross-validation estimate of the generalization error for
6: empirical risk minimizers. In the general setting, we prove
7: sanity-check bounds in the spirit of \cite{KR99}
8: \textquotedblleft\textit{bounds showing that the worst-case error
9: of this estimate is not much worse that of training error
10: estimate} \textquotedblright . General loss functions and class of
11: predictors with finite VC-dimension are considered. We closely follow the formalism introduced by \cite{DUD03} to cover a
12: large variety of
13: cross-validation procedures including leave-one-out cross-validation, $k$%
14: -fold cross-validation, hold-out cross-validation (or split sample), and the
15: leave-$\upsilon$-out cross-validation.
16: 
17: \bigskip
18: 
19: \noindent  In particular, we focus on proving the consistency of
20: the various cross-validation procedures. We point out the
21: interest of each cross-validation procedure in terms of rate of
22: convergence. An estimation curve with transition phases depending
23: on the cross-validation procedure and not only on the percentage
24: of observations in the test sample gives a simple rule on how to
25: choose the cross-validation. An interesting consequence is that
26: the size of the test sample is not required to grow to infinity
27: for the consistency of the cross-validation procedure.
28: 
29: \bigskip
30: 
31: \begin{keywords}%
32: \noindent  Keywords : Cross-validation, generalization error,
33: concentration inequality, optimal splitting, resampling.
34: \end{keywords}
35: \end{abstract}
36: