abstract:9bddb5ba7544ecc8.tex

1: \begin{abstract}

2: %This paper considers training neural networks with specific structures

3: %using regularizers.

4: This paper proposes an algorithm, \rmda, for training neural networks

5: (NNs) with a regularization term for promoting desired structures.

6: \rmda does not incur computation additional to proximal SGD with

7: momentum, and achieves variance reduction without requiring the

8: objective function to be of the finite-sum form.

9: Through the tool of manifold identification from nonlinear

10: optimization, we prove that after a finite number of

11: iterations, all iterates of \rmda possess a desired structure

12: identical to that induced by the regularizer at the stationary point

13: of asymptotic convergence, even in the presence of engineering tricks

14: like data augmentation that complicate the training process.

15: %On the contrary, although existing works utilizing stochastic

16: %gradients with the proximal operator associated with the regularizer

17: %to promote desired structures in neural networks have exhibited some

18: %preliminary success empirically, we argue that these algorithms are

19: %actually unable to find the right model structure due to the presence

20: %of the variance in the stochastic gradient and the structure found by

21: %these algorithms actually oscillate over epochs.

22: Experiments on training NNs with structured sparsity confirm that

23: variance reduction is necessary for such an identification, and

24: show that \rmda thus significantly outperforms existing methods

25: for this task.

26: For unstructured sparsity, \rmda also outperforms a state-of-the-art

27: pruning method, validating the benefits of training structured NNs

28: through regularization.

29: Implementation of \rmda is available at

30: \url{https://www.github.com/zihsyuan1214/rmda}.

31: \end{abstract}

32: