7ea07012883f9c31.tex
1: \begin{abstract}
2: Weight pruning methods of DNNs have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods have been proposed to overcome the limitation of 
3: irregular network structure
4: and demonstrated actual GPU acceleration. %measurements.
5: However, in prior work the pruning rate (degree of sparsity) and GPU acceleration are limited (to less than 50\%) when accuracy needs to be maintained.
6: In this work, we overcome these limitations by proposing a unified, systematic framework of structured weight pruning for DNNs. It is a framework that can be used to induce different types of structured sparsity, such as filter-wise, channel-wise, and shape-wise sparsity, as well non-structured sparsity. The proposed framework incorporates stochastic gradient descent with ADMM, and can be understood as a dynamic regularization method in which the regularization target is analytically updated in each iteration.
7: % %
8: % A significant improvement in structured weight pruning ratio is achieved without loss of accuracy, along with fast convergence rate. 
9: %
10: % With a small sparsity degree of 33.3\% on the convolutional layers, we achieve 1.64\% accuracy enhancement for the AlexNet (CaffeNet) model.
11: % %
12: % This is obtained  by mitigation of overfitting.
13: %
14: Without loss of accuracy on the AlexNet model, we achieve 2.58$\times$ and 3.65$\times$ average measured speedup on two GPUs, clearly outperforming the prior work. The average speedups reach 3.15$\times$ and 8.52$\times$ when allowing a moderate accuracy loss of 2\%. In this case the model compression for convolutional layers is 15.0$\times$, corresponding to 11.93$\times$ measured CPU speedup. Our experiments on ResNet model and on other datasets like UCF101 and CIFAR-10 demonstrate the consistently higher performance of our framework. 
15: \end{abstract}
16: