1: \begin{abstract}
2: Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, \eg, feature selection, compressed sensing and model compression. However, for large-scale stochastic training problems,
3: %fast convergence and
4: effective group sparsity exploration are typically hard to achieve.
5: %simultaneously.
6: Particularly, the state-of-the-art stochastic optimization algorithms %e.g., \proxsg, \rda{}, \proxsvrg{}, \saga{} and~\proxspider{},
7: usually generate merely dense solutions.
8: %in the sense of group sparsity.
9: To overcome this shortage, we propose a stochastic method---Half-space Stochastic Projected Gradient (HSPG) method
10: to
11: %promote the group sparsity of the solutions
12: search solutions of high group sparsity
13: while maintain the convergence. Initialized by a simple Prox-SG Step, the \algacro{} method
14: relies on a novel~\halfspacestep{} to substantially boost the sparsity level.
15: %, following by a simple~\proxsgstep{} as an initialization.
16: %contains two steps: (i) the proximal stochastic gradient step searches a near-optimal non-sparse solution estimate; and (ii) the half-space step substantially boosts the sparsity level.
17: Numerically, \algacro{} demonstrates its superiority in
18: %both convex settings and non-convex
19: deep neural networks, \eg,~\vgg{},~\resnet{} and~\mobilenet, by computing solutions of higher group sparsity, competitive objective values and generalization accuracy.
20: \end{abstract}
21: