36bd700e28296ec9.tex
1: \begin{abstract}
2: Many statistical $M$-estimators are based on convex optimization
3: problems formed by the combination of a data-dependent loss function
4: with a norm-based regularizer.  We analyze the convergence rates of
5: projected gradient and composite gradient methods for solving such
6: problems, working within a high-dimensional framework that allows the
7: data dimension $\pdim$ to grow with (and possibly exceed) the sample
8: size $\numobs$.  This high-dimensional structure precludes the usual
9: global assumptions---namely, strong convexity and smoothness
10: conditions---that underlie much of classical optimization analysis.
11: We define appropriately restricted versions of these conditions, and
12: show that they are satisfied with high probability for various
13: statistical models.  Under these conditions, our theory guarantees
14: that projected gradient descent has a globally geometric rate of
15: convergence up to the \emph{statistical precision} of the model,
16: meaning the typical distance between the true unknown parameter
17: $\theta^*$ and an optimal solution $\widehat{\theta}$.  This result is
18: substantially sharper than previous convergence results, which yielded
19: sublinear convergence, or linear convergence only up to the noise
20: level.  Our analysis applies to a wide range of $M$-estimators and
21: statistical models, including sparse linear regression using Lasso
22: ($\ell_1$-regularized regression); group Lasso for block sparsity;
23: log-linear models with regularization; low-rank matrix recovery using
24: nuclear norm regularization; and matrix decomposition.  Overall, our
25: analysis reveals interesting connections between statistical precision
26: and computational efficiency in high-dimensional estimation.
27: \end{abstract}
28: