abstract:7ea07012883f9c31.tex

1: \begin{abstract}

2: Weight pruning methods of DNNs have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods have been proposed to overcome the limitation of

3: irregular network structure

4: and demonstrated actual GPU acceleration. %measurements.

5: However, in prior work the pruning rate (degree of sparsity) and GPU acceleration are limited (to less than 50\%) when accuracy needs to be maintained.

6: In this work, we overcome these limitations by proposing a unified, systematic framework of structured weight pruning for DNNs. It is a framework that can be used to induce different types of structured sparsity, such as filter-wise, channel-wise, and shape-wise sparsity, as well non-structured sparsity. The proposed framework incorporates stochastic gradient descent with ADMM, and can be understood as a dynamic regularization method in which the regularization target is analytically updated in each iteration.

7: % %

8: % A significant improvement in structured weight pruning ratio is achieved without loss of accuracy, along with fast convergence rate.

9: %

10: % With a small sparsity degree of 33.3\% on the convolutional layers, we achieve 1.64\% accuracy enhancement for the AlexNet (CaffeNet) model.

11: % %

12: % This is obtained  by mitigation of overfitting.

13: %

14: Without loss of accuracy on the AlexNet model, we achieve 2.58$\times$ and 3.65$\times$ average measured speedup on two GPUs, clearly outperforming the prior work. The average speedups reach 3.15$\times$ and 8.52$\times$ when allowing a moderate accuracy loss of 2\%. In this case the model compression for convolutional layers is 15.0$\times$, corresponding to 11.93$\times$ measured CPU speedup. Our experiments on ResNet model and on other datasets like UCF101 and CIFAR-10 demonstrate the consistently higher performance of our framework.

15: \end{abstract}

16: