abstract:f9fd6b58714d6f00.tex

1: \begin{abstract}

2: Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, \eg, feature selection, compressed sensing and model compression. However, for large-scale stochastic training problems,

3: %fast convergence and

4: effective group sparsity exploration are typically hard to achieve.

5: %simultaneously.

6: Particularly, the state-of-the-art stochastic optimization algorithms %e.g., \proxsg, \rda{}, \proxsvrg{}, \saga{} and~\proxspider{},

7: usually generate merely dense solutions.

8: %in the sense of group sparsity.

9: To overcome this shortage, we propose a stochastic method---Half-space Stochastic Projected Gradient (HSPG) method

10: to

11: %promote the group sparsity of the solutions

12: search solutions of high group sparsity

13: while maintain the convergence. Initialized by a simple Prox-SG Step, the \algacro{} method

14: relies on a novel~\halfspacestep{} to substantially boost the sparsity level.

15: %, following by a simple~\proxsgstep{} as an initialization.

16: %contains two steps: (i) the proximal stochastic gradient step searches a near-optimal non-sparse solution estimate; and (ii) the half-space step substantially boosts the sparsity level.

17: Numerically, \algacro{} demonstrates its superiority in

18: %both convex settings and non-convex

19: deep neural networks, \eg,~\vgg{},~\resnet{} and~\mobilenet, by computing solutions of  higher group sparsity, competitive objective values and generalization accuracy.

20: \end{abstract}

21: