abstract:072584ae6addd055.tex

1: \begin{abstract}

2: This paper deals with feature selection using  supervised

3: classification on high dimensional datasets. A

4: classical approach is to project data on a low dimensional space and classify

5: by minimizing an appropriate quadratic cost.

6: Our first contribution is to introduce a matrix of center in the definition of this quadratic cost. The benefits of are twofold: speed-up the convergence and provide a reliable signature (subset of selected genes for each class).

7: Moreover, as quadratic costs are not robust to outliers, we also propose to use Huber loss instead.

8: A classical  control

9: on sparsity is obtained by adding an $\ell_1$ constraint on the matrix of weights used for projecting the data.

10: Our second contribution is to enforce structured sparsity using a constrained formulation. To this end we propose constraints that take into account the matrix structure of the data, based either on the nuclear norm, on the $\ell_{2,1}$-norm, or on the $\ell_{1,2}$-norm for which we provide a new projection algorithm.

11: We optimize simultaneously the projection matrix and the matrix of centers

12: thanks to a tailored constrained primal-dual method.

13:  We demonstrate its effectiveness on four datasets (one synthetic, three from biological data).

14: Extending our primal-dual method to other criteria is easy provided that efficient

15: projections (on the dual ball for the loss data term, or on the constraints) are available. We establish a convergence proof of our numerical method.

16: \end{abstract}

17: