f9fd6b58714d6f00.tex
1: \begin{abstract}
2: Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, \eg, feature selection, compressed sensing and model compression. However, for large-scale stochastic training problems, 
3: %fast convergence and 
4: effective group sparsity exploration are typically hard to achieve. 
5: %simultaneously. 
6: Particularly, the state-of-the-art stochastic optimization algorithms %e.g., \proxsg, \rda{}, \proxsvrg{}, \saga{} and~\proxspider{}, 
7: usually generate merely dense solutions.
8: %in the sense of group sparsity. 
9: To overcome this shortage, we propose a stochastic method---Half-space Stochastic Projected Gradient (HSPG) method
10: to 
11: %promote the group sparsity of the solutions 
12: search solutions of high group sparsity
13: while maintain the convergence. Initialized by a simple Prox-SG Step, the \algacro{} method 
14: relies on a novel~\halfspacestep{} to substantially boost the sparsity level.
15: %, following by a simple~\proxsgstep{} as an initialization. 
16: %contains two steps: (i) the proximal stochastic gradient step searches a near-optimal non-sparse solution estimate; and (ii) the half-space step substantially boosts the sparsity level. 
17: Numerically, \algacro{} demonstrates its superiority in 
18: %both convex settings and non-convex 
19: deep neural networks, \eg,~\vgg{},~\resnet{} and~\mobilenet, by computing solutions of  higher group sparsity, competitive objective values and generalization accuracy.
20: \end{abstract}
21: