abstract:d461c5b95d4db77d.tex

1: \begin{abstract}

2: We present a sparse analogue to stochastic gradient descent that is guaranteed to perform well under

3: similar conditions to the lasso. In the linear regression setup with irrepresentable noise features,

4: our algorithm recovers the support set of the optimal parameter vector with high probability,

5: and achieves a statistically quasi-optimal rate of convergence of $\toop{k\log(d)/T}$,

6: where $k$ is the sparsity of the solution, $d$ is the number of features, and $T$ is the number

7: of training examples. Meanwhile, our algorithm does not require any more

8: computational resources than stochastic gradient descent. In our experiments, we find that our method

9: substantially out-performs existing streaming algorithms on both real and simulated data.

10:

11: \end{abstract}

12: