2b3caac1db5f4da7.tex
1: \begin{abstract}
2: We consider a high dimensional binary classification problem and construct a classification procedure by minimizing the empirical misclassification risk with a penalty on the number of selected features. We derive non-asymptotic probability bounds on the estimated sparsity as well as on the excess misclassification risk. In particular, we show that our method yields a sparse solution whose $\ell_0$-norm can be arbitrarily close to true sparsity with high probability and obtain the rates of convergence for the excess misclassification risk. The proposed procedure is implemented via  the method of mixed integer linear programming. Its numerical performance is illustrated in Monte Carlo experiments.
3: 
4: \bigskip
5: 
6: \noindent
7: \textbf{Keywords}: feature selection,
8: penalized estimation, mixed integer
9: optimization, finite sample property
10: 
11: \bigskip
12: 
13: \pagebreak
14: \end{abstract}
15: