1: \begin{abstract}
2: In statistical machine learning, kernel methods allow to consider infinite dimensional feature spaces
3: with a computational cost that only depends on the number of observations. This is usually done by
4: solving an optimization problem depending on a data fit term and a suitable regularizer. In this paper
5: we consider feature maps which are the concatenation of a fixed, possibly large, set of simpler feature maps. The
6: penalty is a sparsity inducing one, promoting solutions depending only on a small subset of the features.
7: The group lasso problem is a special case of this more general setting.
8: We show that one of the most popular optimization algorithms to solve the regularized objective
9: function, the forward-backward splitting method, allows to perform feature selection in a stable manner.
10: In particular, we prove that the set of relevant features is identified by the algorithm after a finite number of iterations if a suitable qualification condition holds.
11: Our analysis rely on the notions of stratification and mirror stratifiability.
12: \end{abstract}
13: