1: \begin{abstract}
2: We study high-dimensional estimators with the trimmed $\ell_1$ penalty,
3: which leaves the $h$ largest parameter entries penalty-free.
4: While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed.
5: We present the first statistical analyses for $M$-estimation,
6: and characterize support recovery, $\ell_\infty$ and $\ell_2$ error of the trimmed $\ell_1$ estimates as a function of the trimming parameter $h$.
7: Our results show different regimes based on how $h$ compares to the true support size.
8: Our second contribution is a new algorithm for the trimmed regularization problem,
9: which has the same theoretical convergence rate as difference of convex (DC) algorithms,
10: but in practice is faster and finds lower objective values. Empirical evaluation of $\ell_1$ trimming for sparse linear regression and graphical model estimation indicate that trimmed $\ell_1$ can outperform vanilla $\ell_1$ and non-convex alternatives.
11: Our last contribution is to show that the trimmed penalty is beneficial beyond $M$-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.
12: \end{abstract}
13: