2dead7c4b3ec1bf3.tex
1: \begin{abstract}
2: Kernel-based feature selection is an important tool in nonparametric statistics. 
3: Despite many practical applications of kernel-based feature selection, there is 
4: little statistical theory available to support the method.  A core challenge is
5: the objective function of the optimization problems used to define kernel-based
6: feature selection are nonconvex.  The literature has only studied the statistical 
7: properties of the \emph{global optima}, which is a mismatch, given that the gradient-based
8: algorithms available for nonconvex optimization are only able to guarantee convergence
9: to local minima.  Studying the full landscape associated with kernel-based methods,
10: we show that feature selection objectives using the Laplace kernel (and other 
11: $\ell_1$ kernels) come with statistical guarantees that other kernels, including
12: the ubiquitous Gaussian kernel (or other $\ell_2$ kernels) do not possess.
13: Based on a sharp characterization of the gradient of the objective function, 
14: we show that $\ell_1$ kernels eliminate unfavorable stationary points that appear 
15: when using an $\ell_2$ kernel.  Armed with this insight, we establish statistical 
16: guarantees for $\ell_1$ kernel-based feature selection which do not require reaching 
17: the global minima. In particular, we establish model-selection consistency of 
18: $\ell_1$-kernel-based feature selection in recovering main effects and hierarchical 
19: interactions in the nonparametric setting with $n \sim \log p$ samples.
20: 
21: \end{abstract}
22: