1: \begin{abstract}
2: We present a sparse analogue to stochastic gradient descent that is guaranteed to perform well under
3: similar conditions to the lasso. In the linear regression setup with irrepresentable noise features,
4: our algorithm recovers the support set of the optimal parameter vector with high probability,
5: and achieves a statistically quasi-optimal rate of convergence of $\toop{k\log(d)/T}$,
6: where $k$ is the sparsity of the solution, $d$ is the number of features, and $T$ is the number
7: of training examples. Meanwhile, our algorithm does not require any more
8: computational resources than stochastic gradient descent. In our experiments, we find that our method
9: substantially out-performs existing streaming algorithms on both real and simulated data.
10:
11: \end{abstract}
12: