9bddb5ba7544ecc8.tex
1: \begin{abstract}
2: %This paper considers training neural networks with specific structures
3: %using regularizers.
4: This paper proposes an algorithm, \rmda, for training neural networks
5: (NNs) with a regularization term for promoting desired structures.
6: \rmda does not incur computation additional to proximal SGD with
7: momentum, and achieves variance reduction without requiring the
8: objective function to be of the finite-sum form.
9: Through the tool of manifold identification from nonlinear
10: optimization, we prove that after a finite number of
11: iterations, all iterates of \rmda possess a desired structure
12: identical to that induced by the regularizer at the stationary point
13: of asymptotic convergence, even in the presence of engineering tricks
14: like data augmentation that complicate the training process.
15: %On the contrary, although existing works utilizing stochastic
16: %gradients with the proximal operator associated with the regularizer
17: %to promote desired structures in neural networks have exhibited some
18: %preliminary success empirically, we argue that these algorithms are
19: %actually unable to find the right model structure due to the presence
20: %of the variance in the stochastic gradient and the structure found by
21: %these algorithms actually oscillate over epochs.
22: Experiments on training NNs with structured sparsity confirm that
23: variance reduction is necessary for such an identification, and
24: show that \rmda thus significantly outperforms existing methods
25: for this task.
26: For unstructured sparsity, \rmda also outperforms a state-of-the-art
27: pruning method, validating the benefits of training structured NNs
28: through regularization.
29: Implementation of \rmda is available at
30: \url{https://www.github.com/zihsyuan1214/rmda}.
31: \end{abstract}
32: