abstract:36bd700e28296ec9.tex

1: \begin{abstract}

2: Many statistical $M$-estimators are based on convex optimization

3: problems formed by the combination of a data-dependent loss function

4: with a norm-based regularizer.  We analyze the convergence rates of

5: projected gradient and composite gradient methods for solving such

6: problems, working within a high-dimensional framework that allows the

7: data dimension $\pdim$ to grow with (and possibly exceed) the sample

8: size $\numobs$.  This high-dimensional structure precludes the usual

9: global assumptions---namely, strong convexity and smoothness

10: conditions---that underlie much of classical optimization analysis.

11: We define appropriately restricted versions of these conditions, and

12: show that they are satisfied with high probability for various

13: statistical models.  Under these conditions, our theory guarantees

14: that projected gradient descent has a globally geometric rate of

15: convergence up to the \emph{statistical precision} of the model,

16: meaning the typical distance between the true unknown parameter

17: $\theta^*$ and an optimal solution $\widehat{\theta}$.  This result is

18: substantially sharper than previous convergence results, which yielded

19: sublinear convergence, or linear convergence only up to the noise

20: level.  Our analysis applies to a wide range of $M$-estimators and

21: statistical models, including sparse linear regression using Lasso

22: ($\ell_1$-regularized regression); group Lasso for block sparsity;

23: log-linear models with regularization; low-rank matrix recovery using

24: nuclear norm regularization; and matrix decomposition.  Overall, our

25: analysis reveals interesting connections between statistical precision

26: and computational efficiency in high-dimensional estimation.

27: \end{abstract}

28: