abstract:b30f912901b799af.tex

1: \begin{abstract}

2:    Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones,

3:     embedded devices and smart chips.

4:     The mixed-precision quantization model can match different quantization  bit-precisions according to the sensitivity of different layers to achieve great performance.

5:     However, it is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints (e.g., hardware resources, energy consumption, model size and computation latency).

6:     To address this issue, we propose a novel sequential single path search (SSPS) method for mixed-precision quantization,

7:     in which the given constraints are introduced into its loss function to guide searching process.

8:     A single path search cell is used to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms.

9:     Moreover, we sequentially determine the candidate precisions according to the selection certainties to

10:     exponentially reduce the search space and speed up the convergence of searching process.

11:     Experiments show that our method can efficiently search the mixed-precision models for different architectures (e.g., ResNet-20, 18, 34, 50 and MobileNet-V2) and datasets (e.g., CIFAR-10, ImageNet and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform counterparts.

12:     %\emph{\textbf{Code can be available in the supplementary materials.}}

13: \end{abstract}

14: